MySQL教程:Group By用法
GROUP BY (Aggregate) Functions
mysql group_by 弄出來的是只有各一筆的資料, 所以通常會根據某種統計一起用count, max, sum
ruby group_by api
2009年4月30日 星期四
2009年4月29日 星期三
virtualhost and rewrite
很多virtualhost 的例子可以參考
VirtualHost Examples
apache Module Rewrite - URL Rewriting
RewriteCond Directive
rewrite rule
last|L (last rule)
Stop the rewriting process here and don't apply any more rewrite rules. This corresponds to the Perl last command or the break command in C. Use this flag to prevent the currently rewritten URL from being rewritten further by following rules. For example, use it to rewrite the root-path URL ('/') to a real one, e.g., '/e/www/'.
VirtualHost Examples
apache Module Rewrite - URL Rewriting
RewriteCond Directive
rewrite rule
last|L (last rule)
Stop the rewriting process here and don't apply any more rewrite rules. This corresponds to the Perl last command or the break command in C. Use this flag to prevent the currently rewritten URL from being rewritten further by following rules. For example, use it to rewrite the root-path URL ('/') to a real one, e.g., '/e/www/'.
Advanced Rails , Database Chapter memo(4)
Magic Multi-Connections
這樣也有辦法讀寫分離!!
Magic Multi-Connections gem allows you to connect to different databases concurrently
from the same application. This is very useful when using one master and several
read-only slaves serving the same models.
For a single-master situation, you could define another database connection in
database.yml for the read slave:
This database is backed by a module, which mirrors the ActiveRecord classes using
this database connection:
Caching
用memcache 啦, 這邊推薦的是用 acts_as_cached plugin
Load Balancing and High Availability
Load balancing
Spreading request load over several systems so as to reduce the load placed on a
single system.
High availability
Resiliency to the failure of one or several constituent components; the ability to
continue providing services without interruption despite component failure.
MySQL
Replication
Master-slave replication works well for load balancing in applications
where reads outnumber writes, since all writes must be applied to the master.
However, master-slave replication as described does not provide high availability;
there is a single master that is a single point of failure.
The MySQL documentation suggests setting up a dynamicDNS entry pointing to the current master; however, this will introduce another potential failure point.
MySQL cluster
The primary high-availability solution for MySQL is the MySQL Cluster technology,
available since version 4.1. Cluster is primarily an in-memory database, though as of
version 5, disk storage is supported. The Cluster product is based on the NDB storage
engine, backed by data nodes.
For high availability, at least three physical servers must be used: two data nodes and
a management node. The management node is needed to arbitrate between the two
data nodes if they become disconnected and out of synchronization with each other.
A replication factor of 2 is used, so the two data nodes must each have enough memory
to hold the working set, unless disk storage is used.
LDAP
LDAP, the Lightweight Directory Access Protocol, is a database system optimized for
user directory information. It is most often used in large organizations, integrated
with the enterprise authentication and email systems. However, it is a database in its
own right. We do not have space to cover LDAPin detail, but there are many
resources available for working with LDAP in Rails.
ActiveLDAP
The ActiveLDAPlibrary (http://ruby-activeldap.rubyforge.org/) is an almost drop-in
replacement for ActiveRecord that uses LDAPinstead of an RDBMS as a backend.
這樣也有辦法讀寫分離!!
Magic Multi-Connections gem allows you to connect to different databases concurrently
from the same application. This is very useful when using one master and several
read-only slaves serving the same models.
For a single-master situation, you could define another database connection in
database.yml for the read slave:
This database is backed by a module, which mirrors the ActiveRecord classes using
this database connection:
Caching
用memcache 啦, 這邊推薦的是用 acts_as_cached plugin
Load Balancing and High Availability
Load balancing
Spreading request load over several systems so as to reduce the load placed on a
single system.
High availability
Resiliency to the failure of one or several constituent components; the ability to
continue providing services without interruption despite component failure.
MySQL
Replication
Master-slave replication works well for load balancing in applications
where reads outnumber writes, since all writes must be applied to the master.
However, master-slave replication as described does not provide high availability;
there is a single master that is a single point of failure.
The MySQL documentation suggests setting up a dynamicDNS entry pointing to the current master; however, this will introduce another potential failure point.
MySQL cluster
The primary high-availability solution for MySQL is the MySQL Cluster technology,
available since version 4.1. Cluster is primarily an in-memory database, though as of
version 5, disk storage is supported. The Cluster product is based on the NDB storage
engine, backed by data nodes.
For high availability, at least three physical servers must be used: two data nodes and
a management node. The management node is needed to arbitrate between the two
data nodes if they become disconnected and out of synchronization with each other.
A replication factor of 2 is used, so the two data nodes must each have enough memory
to hold the working set, unless disk storage is used.
LDAP
LDAP, the Lightweight Directory Access Protocol, is a database system optimized for
user directory information. It is most often used in large organizations, integrated
with the enterprise authentication and email systems. However, it is a database in its
own right. We do not have space to cover LDAPin detail, but there are many
resources available for working with LDAP in Rails.
ActiveLDAP
The ActiveLDAPlibrary (http://ruby-activeldap.rubyforge.org/) is an almost drop-in
replacement for ActiveRecord that uses LDAPinstead of an RDBMS as a backend.
2009年4月28日 星期二
2009年4月27日 星期一
Advanced Rails , Database Chapter memo(2)
Large/Binary Objects
Database Storage
Sooner or later, many web applications must deal with the issue of LOB (large object) data.
LOB storage is usually divided into CLOB (character large object) for text data and
BLOB (binary large object) for everything else. Some DBMSs separate the two as separate
data types. CLOB types can often be indexed, collated, and searched; BLOBs
cannot.
PostgreSQL
My recommendation is to use filesystem storage for all binary objects if you use
PostgreSQL. Although the database might be the more proper place for this type of
data, it just does not work well enough yet.
MySQL
MySQL suffers from issues similar to PostgreSQL with streaming data, and it is always more awkward for a web application to stream data from the database than from the filesystem.
Filesystem Storage
The reality is that filesystem storage is the best option, as a general rule. Filesystems
are optimized to handle large amounts of binary and/or character data, and they are
fast at it.
Why Is Filesystem Storage So Fast?
the secret to this performance, under Linux and various BSDs, is the kernel
sendfile( ) syscall (not to be confused with X-Sendfile, discussed later). The sendfile( )
function copies data quickly from a file descriptor (which represents an open file) to a
socket (which is connected to the client). This happens in kernel mode, not user mode—
the entire process is handled by the operating system. The web server doesn’t even have
to think about it. When sendfile( ) is invoked
Sending Data with X-Sendfile
Often you will need to send a file to the client for download after doing some processing
in Rails. The easy way to do this is with the send_file or send_data API calls, which stream data from the server to the client:
The X-Sendfile protocol is a very simple standard, first introduced in the Lighttpd
web server, which directs the web server to send a file from the filesystem to the client
rather than a response generated by the application server. Because the web
server is optimized for throwing files at the client, this usually yields a decent speed
improvement over reading the file into memory and sending it from Rails with the
send_file or send_data API calls.
X-Sendfile uses the X-Sendfile HTTPheader pointing to the server’s path to the file
to send, in conjunction with the other standard HTTPheaders. A typical response
using X-Sendfile would look something like this:
From Rails, you can set the response headers by modifying the response.headers hash:
Web server configuration
Of course, the front end web server must be properly configured to recognize and
process the X-Sendfile header
Database Storage
Sooner or later, many web applications must deal with the issue of LOB (large object) data.
LOB storage is usually divided into CLOB (character large object) for text data and
BLOB (binary large object) for everything else. Some DBMSs separate the two as separate
data types. CLOB types can often be indexed, collated, and searched; BLOBs
cannot.
PostgreSQL
My recommendation is to use filesystem storage for all binary objects if you use
PostgreSQL. Although the database might be the more proper place for this type of
data, it just does not work well enough yet.
MySQL
MySQL suffers from issues similar to PostgreSQL with streaming data, and it is always more awkward for a web application to stream data from the database than from the filesystem.
Filesystem Storage
The reality is that filesystem storage is the best option, as a general rule. Filesystems
are optimized to handle large amounts of binary and/or character data, and they are
fast at it.
Why Is Filesystem Storage So Fast?
the secret to this performance, under Linux and various BSDs, is the kernel
sendfile( ) syscall (not to be confused with X-Sendfile, discussed later). The sendfile( )
function copies data quickly from a file descriptor (which represents an open file) to a
socket (which is connected to the client). This happens in kernel mode, not user mode—
the entire process is handled by the operating system. The web server doesn’t even have
to think about it. When sendfile( ) is invoked
Sending Data with X-Sendfile
Often you will need to send a file to the client for download after doing some processing
in Rails. The easy way to do this is with the send_file or send_data API calls, which stream data from the server to the client:
The X-Sendfile protocol is a very simple standard, first introduced in the Lighttpd
web server, which directs the web server to send a file from the filesystem to the client
rather than a response generated by the application server. Because the web
server is optimized for throwing files at the client, this usually yields a decent speed
improvement over reading the file into memory and sending it from Rails with the
send_file or send_data API calls.
X-Sendfile uses the X-Sendfile HTTPheader pointing to the server’s path to the file
to send, in conjunction with the other standard HTTPheaders. A typical response
using X-Sendfile would look something like this:
From Rails, you can set the response headers by modifying the response.headers hash:
Web server configuration
Of course, the front end web server must be properly configured to recognize and
process the X-Sendfile header
Advanced Rails , Database Chapter memo(1)
AcitveRecord 支援的database
PostgreSQL
PostgreSQL’s support for concurrency is more mature than MySQL’s. Postgres supports
multiversion concurrency control (MVCC), which is even more advanced than
row-level locking.
One advantage that PostgreSQL may have in the enterprise is its similarity to commercial
enterprise databases such as Oracle, MS SQL Server, or DB2.
Multiversion Concurrency Control
MVCC gives each transaction a snapshot of the data it accesses, as the data existed at the start of the transaction. The transaction performs actions on the data, which are logged with timestamps
The primary advantage that MVCC has over locking is that MVCC does not block readers.
MVCC uses more storage space than locking because it has to store a snapshot for each in-progress transaction. And though MVCC never blocks reads, the DBMS may roll back update transactions if they cause a conflict.
MySQL
Use InnoDB for absolutely anything where data integrity or concurrency matter.
MyISAM, the default engine on most MySQL installations, does not support features
that most RDBMSs consider essential: foreign key constraints, row-level
locking, and transactions.
Unfortunately, InnoDB can be much slower than MyISAM, and the table sizes
are usually several times larger
if raw speed of reads or writes is the primary concern, MyISAM can
help. For example, a logging server for web analytics might use MyISAM tables:
you want to be able to dump logs into it as fast as possible, and reads are performed
far less often than writes.
Set the SQL mode to TRADITIONAL. This can be accomplished with the following
command:
SET GLOBAL sql_mode='TRADITIONAL';
This will make MySQL a little bit more strict, raising errors on incorrect data rather than silently discarding it.
PostgreSQL
PostgreSQL’s support for concurrency is more mature than MySQL’s. Postgres supports
multiversion concurrency control (MVCC), which is even more advanced than
row-level locking.
One advantage that PostgreSQL may have in the enterprise is its similarity to commercial
enterprise databases such as Oracle, MS SQL Server, or DB2.
Multiversion Concurrency Control
MVCC gives each transaction a snapshot of the data it accesses, as the data existed at the start of the transaction. The transaction performs actions on the data, which are logged with timestamps
The primary advantage that MVCC has over locking is that MVCC does not block readers.
MVCC uses more storage space than locking because it has to store a snapshot for each in-progress transaction. And though MVCC never blocks reads, the DBMS may roll back update transactions if they cause a conflict.
MySQL
Use InnoDB for absolutely anything where data integrity or concurrency matter.
MyISAM, the default engine on most MySQL installations, does not support features
that most RDBMSs consider essential: foreign key constraints, row-level
locking, and transactions.
Unfortunately, InnoDB can be much slower than MyISAM, and the table sizes
are usually several times larger
if raw speed of reads or writes is the primary concern, MyISAM can
help. For example, a logging server for web analytics might use MyISAM tables:
you want to be able to dump logs into it as fast as possible, and reads are performed
far less often than writes.
Set the SQL mode to TRADITIONAL. This can be accomplished with the following
command:
SET GLOBAL sql_mode='TRADITIONAL';
This will make MySQL a little bit more strict, raising errors on incorrect data rather than silently discarding it.
ROR項目中連接多個數據庫的方法, abstract_class, establish_connection
ROR項目中連接多個數據庫的方法
如何連結到舊系統的資料庫
The self.abstract_class = true statements tell ActiveRecord that the LegacyDb classes cannot be instantiated themselves; since they represent database connections,they are not backed by concrete tables in the database
如何連結到舊系統的資料庫
The self.abstract_class = true statements tell ActiveRecord that the LegacyDb classes cannot be instantiated themselves; since they represent database connections,they are not backed by concrete tables in the database
2009年4月26日 星期日
SSL FAQ
SSL FAQ
什麼是 SSL?
Secure Sockets Layer (SSL) 協定最初係由1996年 Netscape 公司所提出,現已成為網路上進行加密通訊之全球化標準。在SSL協定中, 伺服器需先安裝SSL伺服器數位憑證才能啟動SSL功能。SSL伺服器數位憑證能讓用戶端去認得伺服器端真實身分,但先決條件是這需要核發該 SSL 伺服器數位憑證之憑證機構為一個公正且可受信賴的憑證機構(CA)。其將進行以下三個重要的步驟:
SSL 伺服器鑑別 – 讓使用者確認伺服器的身份。SSL伺服器數位憑證上會記載要啟動SSL加密功能的伺服器相關資料;因此當如以網頁瀏覽器(IE、Netscape)登入啟動SSL的網站,瀏覽器程式會自動依列在這類程式中受信任的憑證機構 (CA) 清單中來自動檢查伺服器的數位憑證是否有效,例如VeriSign。鑑別伺服器身份在安全電子商務交易環境中,對於使用者是非常重要的一環;例如,當使用者欲傳送自己的信用卡資訊給網站伺服器之前一定會想先確認該伺服器的真實身份為何。
啟動加密SSL連線 - 要求所有在用戶端及伺服器間所傳遞資訊需透過SSL來進行加密,如此才可確保資訊不會在網路上遭到攔截。
資料完整性確認 - SSL 會自動偵測資料於傳遞途中是否有遭到修改的危險。如此一來,使用者便可以安心地將個人私密資料 (如信用卡資訊) 透過網際網路進行傳輸,並信任 SSL 機制會保護這些資訊的隱私及安全性。
SSL 如何運作?
SSL Handshake 是進行SSL加密通訊的重要過程, 其包括以下步驟:
1. 用戶連接到您的網站,並存取具有安全性 URL 位置;即由SSL伺服器數位憑證所保護的網頁。
(可由查看 URL 開頭是否為 "https: " 而非 "http:" 來進行辨識,或瀏覽器會提供您相關的訊息) 。
2. 您的伺服器進行回應,並自動傳送您網站的數位憑證給用戶,用以鑑別您的網站身分。
3. 用戶的網頁瀏覽器程式產生一把唯一的 "階段金鑰",用以跟網站之間所有的通訊過程進行加密。
4. 使用者的瀏覽器以網站的公開金鑰對階段金鑰進行加密,以便只有讓您的網站得以閱讀此階段金鑰。
5. 現在,具有安全性的階段作業已經建立。此完整過程僅需幾秒鐘時間,使用者不需進行任何動作。依不同的瀏覽器程式而定,使用者會在瀏覽器上看到一個鑰匙圖示變完整,或一個門栓圖示變成上鎖,用以表示目前工作階段是具有安全性的。
什麼是 SSL?
Secure Sockets Layer (SSL) 協定最初係由1996年 Netscape 公司所提出,現已成為網路上進行加密通訊之全球化標準。在SSL協定中, 伺服器需先安裝SSL伺服器數位憑證才能啟動SSL功能。SSL伺服器數位憑證能讓用戶端去認得伺服器端真實身分,但先決條件是這需要核發該 SSL 伺服器數位憑證之憑證機構為一個公正且可受信賴的憑證機構(CA)。其將進行以下三個重要的步驟:
SSL 伺服器鑑別 – 讓使用者確認伺服器的身份。SSL伺服器數位憑證上會記載要啟動SSL加密功能的伺服器相關資料;因此當如以網頁瀏覽器(IE、Netscape)登入啟動SSL的網站,瀏覽器程式會自動依列在這類程式中受信任的憑證機構 (CA) 清單中來自動檢查伺服器的數位憑證是否有效,例如VeriSign。鑑別伺服器身份在安全電子商務交易環境中,對於使用者是非常重要的一環;例如,當使用者欲傳送自己的信用卡資訊給網站伺服器之前一定會想先確認該伺服器的真實身份為何。
啟動加密SSL連線 - 要求所有在用戶端及伺服器間所傳遞資訊需透過SSL來進行加密,如此才可確保資訊不會在網路上遭到攔截。
資料完整性確認 - SSL 會自動偵測資料於傳遞途中是否有遭到修改的危險。如此一來,使用者便可以安心地將個人私密資料 (如信用卡資訊) 透過網際網路進行傳輸,並信任 SSL 機制會保護這些資訊的隱私及安全性。
SSL 如何運作?
SSL Handshake 是進行SSL加密通訊的重要過程, 其包括以下步驟:
1. 用戶連接到您的網站,並存取具有安全性 URL 位置;即由SSL伺服器數位憑證所保護的網頁。
(可由查看 URL 開頭是否為 "https: " 而非 "http:" 來進行辨識,或瀏覽器會提供您相關的訊息) 。
2. 您的伺服器進行回應,並自動傳送您網站的數位憑證給用戶,用以鑑別您的網站身分。
3. 用戶的網頁瀏覽器程式產生一把唯一的 "階段金鑰",用以跟網站之間所有的通訊過程進行加密。
4. 使用者的瀏覽器以網站的公開金鑰對階段金鑰進行加密,以便只有讓您的網站得以閱讀此階段金鑰。
5. 現在,具有安全性的階段作業已經建立。此完整過程僅需幾秒鐘時間,使用者不需進行任何動作。依不同的瀏覽器程式而定,使用者會在瀏覽器上看到一個鑰匙圖示變完整,或一個門栓圖示變成上鎖,用以表示目前工作階段是具有安全性的。
Connecting to the MySQL Server
Connecting to the MySQL Server
On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host or -h to specify a host name value of 127.0.0.1, or the IP address or name of the local server. You can also specify the connection protocol explicitly, even for localhost, by using the --protocol=TCP option. For example:
shell> mysql --host=127.0.0.1
shell> mysql --protocol=TCP
Connections to remote servers always use TCP/IP. This command connects to the server running on remote.example.com using the default port number (3306):
shell> mysql --host=remote.example.com
To specify a port number explicitly, use the --port or -P option:
shell> mysql --host=remote.example.com --port=13306
You can specify a port number for connections to a local server, too. However, as indicated previously, connections to localhost on Unix will use a socket file by default. You will need to force a TCP/IP connection as already described or any option that specifies a port number will be ignored.
For this command, the program uses a socket file on Unix and the --port option is ignored:
shell> mysql --port=13306 --host=localhost
To cause the port number to be used, invoke the program in either of these ways:
shell> mysql --port=13306 --host=127.0.0.1
shell> mysql --port=13306 --protocol=TCP
--socket=file_name, -S file_name
On Unix, the name of the Unix socket file to use, for connections made via a named pipe to a local server. The default Unix socket file name is /tmp/mysql.sock.
On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host or -h to specify a host name value of 127.0.0.1, or the IP address or name of the local server. You can also specify the connection protocol explicitly, even for localhost, by using the --protocol=TCP option. For example:
shell> mysql --host=127.0.0.1
shell> mysql --protocol=TCP
Connections to remote servers always use TCP/IP. This command connects to the server running on remote.example.com using the default port number (3306):
shell> mysql --host=remote.example.com
To specify a port number explicitly, use the --port or -P option:
shell> mysql --host=remote.example.com --port=13306
You can specify a port number for connections to a local server, too. However, as indicated previously, connections to localhost on Unix will use a socket file by default. You will need to force a TCP/IP connection as already described or any option that specifies a port number will be ignored.
For this command, the program uses a socket file on Unix and the --port option is ignored:
shell> mysql --port=13306 --host=localhost
To cause the port number to be used, invoke the program in either of these ways:
shell> mysql --port=13306 --host=127.0.0.1
shell> mysql --port=13306 --protocol=TCP
--socket=file_name, -S file_name
On Unix, the name of the Unix socket file to use, for connections made via a named pipe to a local server. The default Unix socket file name is /tmp/mysql.sock.
Apache 壓力測試
Apache 壓力測試
在 apache 本身的套件中, 有個叫 ab(ApacheBench) 的程式.
ApacheBench 主要是用來測試 apache 執行效率.
進階版
Apache 壓力測試(二) -- siege
在 apache 本身的套件中, 有個叫 ab(ApacheBench) 的程式.
ApacheBench 主要是用來測試 apache 執行效率.
進階版
Apache 壓力測試(二) -- siege
Phusion Passenger, now with Global Queuing
Phusion Passenger, now with Global Queuing
If global queuing is turned on, then Phusion Passenger will use a global queue that’s shared between all backend processes. If an HTTP request comes in, and all the backend processes are still busy, then Phusion Passenger will wait until at least one backend process is done, and will then forward the request to that process.
If global queuing is turned on, then Phusion Passenger will use a global queue that’s shared between all backend processes. If an HTTP request comes in, and all the backend processes are still busy, then Phusion Passenger will wait until at least one backend process is done, and will then forward the request to that process.
Passenger architectural overview
Ruby on Rails 架設原理
Passenger architectural overview
1.1. Typical web applications
1. 一個單純的web application 接受 http request 從某種I/O channel, 處理, 然後丟 http response 回去 client, 這種情況會一直loop 下去直到 application 結束.
這樣並不代表web application 必須能直接跟 HTTP 溝通, 而只是代表 web application 能夠接受某種http request representation
2. 很少 web application 直接跟HTPP client 溝通, 常見的方式有
Few web applications are accessible directly by HTTP clients. Common setups are:
1. web application 由 application server 控制, 一個 application server 可能有多個 web application, 然後 application server 在連結 web server . web server <-> application server <-> web application. A typical example of such a setup is a J2EE application, contained in the Tomcat web server, behind the Apache web server.
2. web application 由 web server 控制, 在這種情況 web server 就像 application server, 這個例子就是像 apacher server 用 mod_php 控制 php application. 注意這樣不代表 web application 都是跑在和web server 的 procsss 上, 它只是代表由 web server 來管理 web application
3. web application 本身就是 web server 能夠直接收 HTTP request , 這樣的代表是 Trac bug tracking system , running in its standalone server, 在很多的佈署時, 像這樣的 web application 會被設定在不同的 web server 的後段, 不讓他們直接收 HTTP request, 前端的web server 便如 reverse HTTP proxy 一般
4. web application 不能夠夠直接說 HTTP directly 但是卻能夠直接與web server 透過某種介面溝通, 這樣的例子是CGI, FastCGI 和 SCGI
上面這些都是真的有在運作的佈署模式, 沒有一個模式能做到而其他做不到, 對於client 而言 web server, application server , web application 就如一個 black box
另外也需注意的是上面各種模式並沒有特別針對偏向某種 I/O processing , The web servers, application servers, web applications, etc. could process I/O serially (i.e. one request at a time), could multiplex I/O with a single thread (e.g. by using select(2) or poll(2)) or it could process I/O with multiple threads and/or multiple processes.
1.2. Ruby on Rails
每個 Rails application 都有一個 dispatch . 這個dispacther 負責處理 HTTP request 但是它也不直接受HTPP request, 相反地, 而是它接受的是包含 HTTP request 資訊的某種格式, 所以了解這個dispatcher 對於想要發展能夠與 Rails 透過 HTTP 溝通軟體(例如 web server)的人就很重要
Rails dispatcher 只能過一次處理一個request, 因為rails 不是 thread-safe, 但實際上這也不是什麼大問題
另外一個需要注意的是 Rails application 需要很多memory 來存 program code , 啟動一隻 rails application 在 bootstrapping 花費很多時間
Handling of concurrent requests
就如上面提到的, 一個 Rails application instance 只能一次處理一個request, 這很明顯是不受歡迎的, 但是在我們去找解決方案時, 讓我們先來看一下"競爭對手"如何解決這個問題, PHP 也有同樣類似的問題, 一個 PHP script 只能夠一次處理一個 HTTP request
* mod_php 解決這個問題利用apache 的MPM 機制, 換句話說, mod_php 本身不做些什麼, 一個apache worker process/thread 一樣指能夠處理一個 php request, 但是Apache spawns multiple worker processes/threads.
*PHP-FastCGI 處理這個問題是藉由spawn multiple persistent PHP server , PHP server 之間的數目是和apache worker process/threads 的數目是不相關的
Passenger 不能使用mod_php的方法, 因為它會使我們 spawn a new Rails application 對於每一個來的request 這樣會導致非常的慢, 相反地Passenger 用PHP-FastCGI的方法, 我們 maintain 一個 application instance pool 當有request 來的時候我們將forward 這個request 到其中一個application instance 去, 這個 pool size 是可以設定的, 這對於管理者而言能夠去控制loading 和 memory
Apache
The Apache web server has a pluggable I/O multiprocessing (the ability to handle more than 1 concurrent HTTP client at the same time) architecture.
An Apache module which implements a particular multiprocessing strategy, is called a Multi-Processing Module (MPM). The prefork MPM — which also happens to be the default — appears to be the most popular one.
This MPM spawns multiple worker child processes. HTTP requests are first accepted by a so-called control process, and then forwarded to one of the worker processes. The next section contains a diagram which shows the prefork MPM's architecture.
Passenger architecture
Passenger's architecture is a lot like setup #2 described in Typical web applications.
In other words, Passenger extends Apache and allows it to act like an application server. Passenger's architecture — assuming Apache 2 with the prefork MPM is used — is shown in the following diagram:
Passenger consists of an Apache module, mod_passenger. This is written in C++, and can be found in the directory ext/apache2.
The module is active in the Apache control process and in all the Apache worker processes.
When an HTTP request comes in, mod_passenger will check whether the request should be handled by a Ruby on Rails application.
If so, then mod_passenger will spawn the corresponding Rails application (if necessary) and forward the request to that application.
It should be noted that the Ruby on Rails application does not run in the same address space as Apache.
This differentiates Passenger from other application-server-inside-web-server software such as mod_php, mod_perl and mod_ruby.
If the Rails application crashes or leak memory, it will have no effect on Apache. In fact, stability is one of our highest goals. Passenger is carefully designed and implemented so that Apache shouldn't crash because of Passenger.
Spawning and caching of code and applications
A very naive implementation of Passenger would spawn a Ruby on Rails application every time an HTTP request is received, just like CGI would.
However, spawning Ruby on Rails applications is expensive. It can take 1 or 2 seconds on a modern PC, and possibly much longer on a heavily loaded server. This overhead is particularily unacceptable on shared hosts.
A less naive implementation would keep spawned Ruby on Rails application instances alive, similar to how Lighttpd's FastCGI implementation works. However, this still has several problems:
1. The first request to a Rails website will be slow, and subsequent requests will be fast. But the first request to a different Rails website - on the same web server - will still be slow.
2. As we've explained earlier in this article, a lot of memory in a Rails application is spent on storing the AST of the Ruby on Rails framework and the application. Especially on shared hosts and on memory-constrained Virtual Private Servers (VPS), this can be a problem.
Both of these problems are very much solvable, and we've chosen to do just that.
The first problem can be solved by preloading Rails applications, i.e. by running the Rails application before a request is ever made to that website.
This is the approach taken by most Rails hosts, for example in the form of a Mongrel cluster which is running all the time. However, this is unacceptable for a shared host: such an application would just sit there and waste memory even if it's not doing anything. Instead, we've chosen to take a different approach, which solves both of the aforementioned problems.
We spawn Rails applications via a spawn server. The spawn server caches Ruby on Rails framework code and application code in memory.
Spawning a Rails application for the first time will still be slow, but subsequent spawn attempts will be very fast. Furthermore, because the framework code is cached independently from the application code, spawning a different Rails application will also be very fast, as long as that application is using a Rails framework version that has already been cached.
Another implication of the spawn server is that different Ruby on Rails will share memory with each other, thus solving problem #2. This is described in detail in the next section.
But despite the caching of framework code and application code, spawning is still expensive compared to an HTTP request. We want to avoid spawning whenever possible. This is why we've introduced the application pool.
Spawned application instances are kept alive, and their handles are stored into this pool, allowing each application instance to be reused later. Thus, Passenger has very good average case performance.
The application pool is shared between different worker processes.
Because the worker processes cannot share memory with each other, either shared memory must be used to implement the application pool, or a client/server architecture must be implemented.
We've chosen the latter because it is easier to implement. The Apache control process acts like a server for the application pool. However, this does not mean that all HTTP request/response data go through the control process.
A worker process queries the pool for a connection session with a Rails application. Once this session has been obtained, the worker process will communicate directly with the Rails application.
The application pool is implemented inside mod_passenger. One can find detailed documentation about it in the C++ API documentation, in particular the documentation about the ApplicationPool, StandardApplicationPool and ApplicationPoolServer classes.
The application pool is responsible for spawning applications, caching spawned applications' handles, and cleaning up applications which have been idle for an extended period of time.
The spawn server
The spawn server is written in Ruby, and its code can be found in the directory lib/passenger. Its main executable is bin/passenger-spawn-server. The spawn server's RDoc documentation documents the implementation in detail.
The spawn server consists of 3 logical layers:
1. The spawn manager. This is the topmost layer, and acts like a fascade for all the underlying layers. Clients who use the spawn server only communicate with this layer.
2. The framework spawner server. The spawn manager spawns a framework spawner server for each unique Ruby on Rails framework version.
Each framework spawner server caches the code for exactly one Ruby on Rails framework version. A spawn request for an application is forwarded to the framework spawner server that contains the correct Ruby on Rails version for the application.
3. The application spawner server. This is to the framework spawner server what the framework spawner server is to the spawn manager.
The framework spawner server spawns an application spawner server for each unique Ruby on Rails application (here “application” does not mean a running process, but a set of (source code) files). An application spawner server caches the code for exactly one application.
As you can see, we have two layers of code caching: when the spawn server receives a request to spawn a new application instance, it will forward the request to the correct framework spawner server (and will spawn that framework spawner server if it doesn't already exist), which — in turn — will forward it to the correct application spawner server (which will, again, be created if it doesn't already exist).
Each layer is only responsible for the layer directly below. The spawn manager only knows about framework spawner servers, and a framework spawner server only knows about its application spawner servers. The application spawner server is, however, not responsible for managing spawned application instances.
If an application instance is spawned by mod_passenger, its information will be sent back to mod_passenger, which will be fully responsible for managing the application instance's life time (through the application pool).
Also note that each layer is a seperate process. This is required because a single Ruby process can only load a single Ruby on Rails framework and a single application.
Memory sharing
On most modern Unix operating systems, when a child process is created, it will share most of its memory with the parent process.
Processes are not supposed to be able to access each others' memory, so the operating system makes a copy of a piece of memory when it is written to by the parent process or the child process.
This is called copy-on-write (COW). Detailed background information can be found on Ruby Enterprise Edition's website.
The spawn server makes use of this useful fact.
Each layer shares its Ruby AST memory with all of its lower layers, as long as the AST nodes in question haven't been written to.
This means that all spawned Rails applications will — if possible — share the Ruby on Rails framework's code, as well as its own application code, with each other. This results in a dramatic reduction in memory usage.
Passenger architectural overview
1.1. Typical web applications
1. 一個單純的web application 接受 http request 從某種I/O channel, 處理, 然後丟 http response 回去 client, 這種情況會一直loop 下去直到 application 結束.
這樣並不代表web application 必須能直接跟 HTTP 溝通, 而只是代表 web application 能夠接受某種http request representation
2. 很少 web application 直接跟HTPP client 溝通, 常見的方式有
Few web applications are accessible directly by HTTP clients. Common setups are:
1. web application 由 application server 控制, 一個 application server 可能有多個 web application, 然後 application server 在連結 web server . web server <-> application server <-> web application. A typical example of such a setup is a J2EE application, contained in the Tomcat web server, behind the Apache web server.
2. web application 由 web server 控制, 在這種情況 web server 就像 application server, 這個例子就是像 apacher server 用 mod_php 控制 php application. 注意這樣不代表 web application 都是跑在和web server 的 procsss 上, 它只是代表由 web server 來管理 web application
3. web application 本身就是 web server 能夠直接收 HTTP request , 這樣的代表是 Trac bug tracking system , running in its standalone server, 在很多的佈署時, 像這樣的 web application 會被設定在不同的 web server 的後段, 不讓他們直接收 HTTP request, 前端的web server 便如 reverse HTTP proxy 一般
4. web application 不能夠夠直接說 HTTP directly 但是卻能夠直接與web server 透過某種介面溝通, 這樣的例子是CGI, FastCGI 和 SCGI
上面這些都是真的有在運作的佈署模式, 沒有一個模式能做到而其他做不到, 對於client 而言 web server, application server , web application 就如一個 black box
另外也需注意的是上面各種模式並沒有特別針對偏向某種 I/O processing , The web servers, application servers, web applications, etc. could process I/O serially (i.e. one request at a time), could multiplex I/O with a single thread (e.g. by using select(2) or poll(2)) or it could process I/O with multiple threads and/or multiple processes.
1.2. Ruby on Rails
每個 Rails application 都有一個 dispatch . 這個dispacther 負責處理 HTTP request 但是它也不直接受HTPP request, 相反地, 而是它接受的是包含 HTTP request 資訊的某種格式, 所以了解這個dispatcher 對於想要發展能夠與 Rails 透過 HTTP 溝通軟體(例如 web server)的人就很重要
Rails dispatcher 只能過一次處理一個request, 因為rails 不是 thread-safe, 但實際上這也不是什麼大問題
另外一個需要注意的是 Rails application 需要很多memory 來存 program code , 啟動一隻 rails application 在 bootstrapping 花費很多時間
Handling of concurrent requests
就如上面提到的, 一個 Rails application instance 只能一次處理一個request, 這很明顯是不受歡迎的, 但是在我們去找解決方案時, 讓我們先來看一下"競爭對手"如何解決這個問題, PHP 也有同樣類似的問題, 一個 PHP script 只能夠一次處理一個 HTTP request
* mod_php 解決這個問題利用apache 的MPM 機制, 換句話說, mod_php 本身不做些什麼, 一個apache worker process/thread 一樣指能夠處理一個 php request, 但是Apache spawns multiple worker processes/threads.
*PHP-FastCGI 處理這個問題是藉由spawn multiple persistent PHP server , PHP server 之間的數目是和apache worker process/threads 的數目是不相關的
Passenger 不能使用mod_php的方法, 因為它會使我們 spawn a new Rails application 對於每一個來的request 這樣會導致非常的慢, 相反地Passenger 用PHP-FastCGI的方法, 我們 maintain 一個 application instance pool 當有request 來的時候我們將forward 這個request 到其中一個application instance 去, 這個 pool size 是可以設定的, 這對於管理者而言能夠去控制loading 和 memory
Apache
The Apache web server has a pluggable I/O multiprocessing (the ability to handle more than 1 concurrent HTTP client at the same time) architecture.
An Apache module which implements a particular multiprocessing strategy, is called a Multi-Processing Module (MPM). The prefork MPM — which also happens to be the default — appears to be the most popular one.
This MPM spawns multiple worker child processes. HTTP requests are first accepted by a so-called control process, and then forwarded to one of the worker processes. The next section contains a diagram which shows the prefork MPM's architecture.
Passenger architecture
Passenger's architecture is a lot like setup #2 described in Typical web applications.
In other words, Passenger extends Apache and allows it to act like an application server. Passenger's architecture — assuming Apache 2 with the prefork MPM is used — is shown in the following diagram:
Passenger consists of an Apache module, mod_passenger. This is written in C++, and can be found in the directory ext/apache2.
The module is active in the Apache control process and in all the Apache worker processes.
When an HTTP request comes in, mod_passenger will check whether the request should be handled by a Ruby on Rails application.
If so, then mod_passenger will spawn the corresponding Rails application (if necessary) and forward the request to that application.
It should be noted that the Ruby on Rails application does not run in the same address space as Apache.
This differentiates Passenger from other application-server-inside-web-server software such as mod_php, mod_perl and mod_ruby.
If the Rails application crashes or leak memory, it will have no effect on Apache. In fact, stability is one of our highest goals. Passenger is carefully designed and implemented so that Apache shouldn't crash because of Passenger.
Spawning and caching of code and applications
A very naive implementation of Passenger would spawn a Ruby on Rails application every time an HTTP request is received, just like CGI would.
However, spawning Ruby on Rails applications is expensive. It can take 1 or 2 seconds on a modern PC, and possibly much longer on a heavily loaded server. This overhead is particularily unacceptable on shared hosts.
A less naive implementation would keep spawned Ruby on Rails application instances alive, similar to how Lighttpd's FastCGI implementation works. However, this still has several problems:
1. The first request to a Rails website will be slow, and subsequent requests will be fast. But the first request to a different Rails website - on the same web server - will still be slow.
2. As we've explained earlier in this article, a lot of memory in a Rails application is spent on storing the AST of the Ruby on Rails framework and the application. Especially on shared hosts and on memory-constrained Virtual Private Servers (VPS), this can be a problem.
Both of these problems are very much solvable, and we've chosen to do just that.
The first problem can be solved by preloading Rails applications, i.e. by running the Rails application before a request is ever made to that website.
This is the approach taken by most Rails hosts, for example in the form of a Mongrel cluster which is running all the time. However, this is unacceptable for a shared host: such an application would just sit there and waste memory even if it's not doing anything. Instead, we've chosen to take a different approach, which solves both of the aforementioned problems.
We spawn Rails applications via a spawn server. The spawn server caches Ruby on Rails framework code and application code in memory.
Spawning a Rails application for the first time will still be slow, but subsequent spawn attempts will be very fast. Furthermore, because the framework code is cached independently from the application code, spawning a different Rails application will also be very fast, as long as that application is using a Rails framework version that has already been cached.
Another implication of the spawn server is that different Ruby on Rails will share memory with each other, thus solving problem #2. This is described in detail in the next section.
But despite the caching of framework code and application code, spawning is still expensive compared to an HTTP request. We want to avoid spawning whenever possible. This is why we've introduced the application pool.
Spawned application instances are kept alive, and their handles are stored into this pool, allowing each application instance to be reused later. Thus, Passenger has very good average case performance.
The application pool is shared between different worker processes.
Because the worker processes cannot share memory with each other, either shared memory must be used to implement the application pool, or a client/server architecture must be implemented.
We've chosen the latter because it is easier to implement. The Apache control process acts like a server for the application pool. However, this does not mean that all HTTP request/response data go through the control process.
A worker process queries the pool for a connection session with a Rails application. Once this session has been obtained, the worker process will communicate directly with the Rails application.
The application pool is implemented inside mod_passenger. One can find detailed documentation about it in the C++ API documentation, in particular the documentation about the ApplicationPool, StandardApplicationPool and ApplicationPoolServer classes.
The application pool is responsible for spawning applications, caching spawned applications' handles, and cleaning up applications which have been idle for an extended period of time.
The spawn server
The spawn server is written in Ruby, and its code can be found in the directory lib/passenger. Its main executable is bin/passenger-spawn-server. The spawn server's RDoc documentation documents the implementation in detail.
The spawn server consists of 3 logical layers:
1. The spawn manager. This is the topmost layer, and acts like a fascade for all the underlying layers. Clients who use the spawn server only communicate with this layer.
2. The framework spawner server. The spawn manager spawns a framework spawner server for each unique Ruby on Rails framework version.
Each framework spawner server caches the code for exactly one Ruby on Rails framework version. A spawn request for an application is forwarded to the framework spawner server that contains the correct Ruby on Rails version for the application.
3. The application spawner server. This is to the framework spawner server what the framework spawner server is to the spawn manager.
The framework spawner server spawns an application spawner server for each unique Ruby on Rails application (here “application” does not mean a running process, but a set of (source code) files). An application spawner server caches the code for exactly one application.
As you can see, we have two layers of code caching: when the spawn server receives a request to spawn a new application instance, it will forward the request to the correct framework spawner server (and will spawn that framework spawner server if it doesn't already exist), which — in turn — will forward it to the correct application spawner server (which will, again, be created if it doesn't already exist).
Each layer is only responsible for the layer directly below. The spawn manager only knows about framework spawner servers, and a framework spawner server only knows about its application spawner servers. The application spawner server is, however, not responsible for managing spawned application instances.
If an application instance is spawned by mod_passenger, its information will be sent back to mod_passenger, which will be fully responsible for managing the application instance's life time (through the application pool).
Also note that each layer is a seperate process. This is required because a single Ruby process can only load a single Ruby on Rails framework and a single application.
Memory sharing
On most modern Unix operating systems, when a child process is created, it will share most of its memory with the parent process.
Processes are not supposed to be able to access each others' memory, so the operating system makes a copy of a piece of memory when it is written to by the parent process or the child process.
This is called copy-on-write (COW). Detailed background information can be found on Ruby Enterprise Edition's website.
The spawn server makes use of this useful fact.
Each layer shares its Ruby AST memory with all of its lower layers, as long as the AST nodes in question haven't been written to.
This means that all spawned Rails applications will — if possible — share the Ruby on Rails framework's code, as well as its own application code, with each other. This results in a dramatic reduction in memory usage.
2009年4月24日 星期五
2009年4月21日 星期二
ultrasphinx has_many, belongs_to 的建法
[分享]Rails中ultrasphinx全文檢索的使用(對象間關聯的處理)
深入探索:Windows+Rails+Sphinx進行中文全文檢索
belongs_to 的時候用 include
has_many 的時候用concatenat
深入探索:Windows+Rails+Sphinx進行中文全文檢索
belongs_to 的時候用 include
has_many 的時候用concatenat
2009年4月18日 星期六
Apache , virtualhost
Host websites on your local machine using Apache websever
配置 Apache 支援多個網域
VirtualHost Directive
Using Name-based Virtual Hosts
NameVirtualHost Directive
這邊設定主要是針對apache2
Configuration details
Let us assume for the sake of this tutorial that I have decided to host two websites on my local machine. All the files related to the two websites have already been created and saved in two separate directories by name websiteA and websiteB . The default location for serving the files in apache is usually in the /var/www location. So I move the two directories websiteA and websiteB to this location.
然後把原本/etc/apache2/sites-available/ 下的default 複製一份改名叫 websiteA
然後把NameVirtualHost, ServerName, DocumentRoot 等改成想要的名字
然後在site-enabled建立softlink
重起server 就行了
配置 Apache 支援多個網域
VirtualHost Directive
Using Name-based Virtual Hosts
NameVirtualHost Directive
這邊設定主要是針對apache2
Configuration details
Let us assume for the sake of this tutorial that I have decided to host two websites on my local machine. All the files related to the two websites have already been created and saved in two separate directories by name websiteA and websiteB . The default location for serving the files in apache is usually in the /var/www location. So I move the two directories websiteA and websiteB to this location.
然後把原本/etc/apache2/sites-available/ 下的default 複製一份改名叫 websiteA
然後把NameVirtualHost, ServerName, DocumentRoot 等改成想要的名字
然後在site-enabled建立softlink
重起server 就行了
2009年4月17日 星期五
升級 passenger 給rails 2.3.2
Upgrading Phusion Passenger for Rails 2.3.2
gem install passenger 先裝新的passenger
passenger-install-apache2-module 再跑一次compile the corresponding Apache module and make changes to your Apache conf file
把apache2.conf 裡的設定換成新的
gem install passenger 先裝新的passenger
passenger-install-apache2-module 再跑一次compile the corresponding Apache module and make changes to your Apache conf file
把apache2.conf 裡的設定換成新的
2009年4月15日 星期三
High performance mysql repliaction memo(3)
Sending Replication Events to Other Slaves
The log_slave_updates option lets you use a slave as a master of other slaves. It
instructs MySQL to write the events the slave SQL thread executes into its own
binary log, which its own slaves can then retrieve and execute.
Unless you’ve taken care to give each server a unique server ID, configuring a slave in
this manner can cause subtle errors and may even cause replication to complain and stop.
MySQL prevents an infinite loop in replication. When the slave SQL thread reads the relay log, it discards any event whose server ID matches its own. This breaks infinite loops in replication.
Preventing infinite loops is important for some of the more useful replication
topologies, such as master-master replication.
Replication Topologies
basic rules:
• A MySQL slave instance can have only one master.
• Every slave must have a unique server ID.
• A master can have many slaves (or, correspondingly, a slave can have many siblings).
• A slave can propagate changes from its master, and be the master of other slaves, if you enable log_slave_updates.
If you set up a master-master active-active configuration carefully, perhaps with wellpartitioned data and privileges, you can avoid some of these problems.* However, it’s
hard to do well, and there’s usually a better way to achieve what you need.
MySQL Does Not Support Multimaster Replication
The log_slave_updates option lets you use a slave as a master of other slaves. It
instructs MySQL to write the events the slave SQL thread executes into its own
binary log, which its own slaves can then retrieve and execute.
Unless you’ve taken care to give each server a unique server ID, configuring a slave in
this manner can cause subtle errors and may even cause replication to complain and stop.
MySQL prevents an infinite loop in replication. When the slave SQL thread reads the relay log, it discards any event whose server ID matches its own. This breaks infinite loops in replication.
Preventing infinite loops is important for some of the more useful replication
topologies, such as master-master replication.
Replication Topologies
basic rules:
• A MySQL slave instance can have only one master.
• Every slave must have a unique server ID.
• A master can have many slaves (or, correspondingly, a slave can have many siblings).
• A slave can propagate changes from its master, and be the master of other slaves, if you enable log_slave_updates.
If you set up a master-master active-active configuration carefully, perhaps with wellpartitioned data and privileges, you can avoid some of these problems.* However, it’s
hard to do well, and there’s usually a better way to achieve what you need.
MySQL Does Not Support Multimaster Replication
High performance mysql repliaction memo(2)
MySQL參數說明
If a slave server is very far behind its master, the slave I/O thread can write many relay logs. The slave SQL thread will remove them as soon as it finishes replaying them (you can change this with the relay_log_purge option),
but if it is running far behind, the I/O thread could actually fill up the disk.
The solution to this problem is the relay_log_space_limit configuration variable. If the total size of all the relay logs grows larger than this variable’s size, the I/O thread will stop and wait for the SQL thread to free up some more disk space
log_slave_updates (to make the slave log the replicated events to its own binary log).
Some people enable just the binary log and not log_slave_updates, so they can see
whether anything, such as a misconfigured application, is modifying data on the slave.
如果使用鏈狀同步或者多台Slave之間進行同步則需要開啟此參數。
Initializing a Slave from Another Server
Using mysqldump
If you use only InnoDB tables, you can use the following command to dump
everything from the master, load it all into the slave, and change the slave’s coordinates
to the corresponding position in the master’s binary log:
$ mysqldump --single-transaction --all-databases --master-data=1
--host=server1 | mysql --host=server2
The --single-transaction option causes the dump to read the data as it existed at
the beginning of the transaction. This option may work with other transactional
storage engines as well, but we haven’t tested it. If you’re not using transactional
tables, you can use the --lock-all-tables option to get a consistent dump of
all tables.
Because neither format is perfect for every situation, MySQL 5.1 switches between
statement-based and row-based replication dynamically. By default, it uses
statement-based replication, but when it detects an event that cannot be replicated
correctly with a statement, it switches to row-based replication. You can also control
the format as needed by setting the binlog_format session variable.
Replication Files
mysql-bin.index
A server that has binary logging enabled will also have a file named the same as
the binary logs, but with a .index suffix. This file keeps track of the binary log
files that exist on disk
mysql-relay-bin.index
This file serves the same purpose for the relay logs as the binary log index file
does for the binary logs.
master.info
This file contains the information a slave server needs to connect to its master.
The format is plain text (one value per line) and varies between MySQL versions.
Don’t delete it, or your slave will not know how to connect to its master after it
restarts. This file contains the replication user’s password, in plain text, so you
may want to restrict its permissions.
These files are a rather crude way of recording MySQL’s replication and logging
state. Unfortunately, they are not written synchronously, so if your server loses
power and the files haven’t yet been flushed to disk, they can be inaccurate when the
server restarts.
By default, the binary logs are named after the server’s hostname with a numeric suffix,
but it’s a good idea to name them explicitly in my.cnf, as in the following
example:
log_bin # Don't do this, or files will be named after the hostname
log_bin = mysql-bin # This is safe
If a slave server is very far behind its master, the slave I/O thread can write many relay logs. The slave SQL thread will remove them as soon as it finishes replaying them (you can change this with the relay_log_purge option),
but if it is running far behind, the I/O thread could actually fill up the disk.
The solution to this problem is the relay_log_space_limit configuration variable. If the total size of all the relay logs grows larger than this variable’s size, the I/O thread will stop and wait for the SQL thread to free up some more disk space
log_slave_updates (to make the slave log the replicated events to its own binary log).
Some people enable just the binary log and not log_slave_updates, so they can see
whether anything, such as a misconfigured application, is modifying data on the slave.
如果使用鏈狀同步或者多台Slave之間進行同步則需要開啟此參數。
Initializing a Slave from Another Server
Using mysqldump
If you use only InnoDB tables, you can use the following command to dump
everything from the master, load it all into the slave, and change the slave’s coordinates
to the corresponding position in the master’s binary log:
$ mysqldump --single-transaction --all-databases --master-data=1
--host=server1 | mysql --host=server2
The --single-transaction option causes the dump to read the data as it existed at
the beginning of the transaction. This option may work with other transactional
storage engines as well, but we haven’t tested it. If you’re not using transactional
tables, you can use the --lock-all-tables option to get a consistent dump of
all tables.
Because neither format is perfect for every situation, MySQL 5.1 switches between
statement-based and row-based replication dynamically. By default, it uses
statement-based replication, but when it detects an event that cannot be replicated
correctly with a statement, it switches to row-based replication. You can also control
the format as needed by setting the binlog_format session variable.
Replication Files
mysql-bin.index
A server that has binary logging enabled will also have a file named the same as
the binary logs, but with a .index suffix. This file keeps track of the binary log
files that exist on disk
mysql-relay-bin.index
This file serves the same purpose for the relay logs as the binary log index file
does for the binary logs.
master.info
This file contains the information a slave server needs to connect to its master.
The format is plain text (one value per line) and varies between MySQL versions.
Don’t delete it, or your slave will not know how to connect to its master after it
restarts. This file contains the replication user’s password, in plain text, so you
may want to restrict its permissions.
These files are a rather crude way of recording MySQL’s replication and logging
state. Unfortunately, they are not written synchronously, so if your server loses
power and the files haven’t yet been flushed to disk, they can be inaccurate when the
server restarts.
By default, the binary logs are named after the server’s hostname with a numeric suffix,
but it’s a good idea to name them explicitly in my.cnf, as in the following
example:
log_bin # Don't do this, or files will be named after the hostname
log_bin = mysql-bin # This is safe
Advanced MySQL Replication Techniques and High performance mysql(1)
Advanced MySQL Replication Techniques
如果要同時使用兩台db, 而且又需要互相replication 那麼就需要這兩個 auto_increment_increment, auto_increment_offset 參數, 這是為了讓兩邊自動增加的 primary key 像是id 不要一致, 這樣相互repliaction 的時候才不會有相同的id 出現
如果出現問題的時候, 就先看error log, 找不到log 時, 就直接設參數
log-error = /var/log/mysql/error.log
以我的例子來說
連不上, 為啥會連不上勒, telnet 192.168.11.103 3306 一下, 真的連不上 = =+
到103 查了一下 才發現上面的mysql server port 開在4040 囧
改好之後就連上了 = =+
還有有問題的時候可以在用show slave status 看一下, 像上面有問題的情況就會看到
Slave_IO_Running 是NO
Slave_IO_State 是Connecting to master
表示連到master 那邊有問題
如果正常的話
Slave_IO_Running 是yes
Slave_IO_State 是Waiting for master to send event
據High performance Mysql 書上的建議
別再把slave 連master 的 master_host, master_port 的設定放在my.cnf 裡, 可能會有問題而且沒有啥優點
請直接用CHANGE MASTER TO 指令去改
MASTER_LOG_POS=0 , 0不是指真正的log position 而是指at the start of log file
Waiting for master to send event 意思是 has fetched all of the master's binary logs
sync_binlog=1 , This makers MySQL synchronize the binary log's contents to disk each time it commits a transaction, so you dont lose log events if there's a crash
innodb_flush_logs_at_trx_commit=1 #Flush every log write
innodb_support_xa=1 #MySQL 5.0 and newer only
innodb_safe_binlog #MySQL 4.1 only
skip_slave_start will prevent the slave from starting automatically after a crash, which can give you a chance to repair a server if it has problems.
Even if you have enabled all the options we suggested, a slave can easily break after a crash, because the relay logs and master.info file are not crash-safe. They are not even flushed to disk, and there is no configuration option to control that behavior
如果要同時使用兩台db, 而且又需要互相replication 那麼就需要這兩個 auto_increment_increment, auto_increment_offset 參數, 這是為了讓兩邊自動增加的 primary key 像是id 不要一致, 這樣相互repliaction 的時候才不會有相同的id 出現
如果出現問題的時候, 就先看error log, 找不到log 時, 就直接設參數
log-error = /var/log/mysql/error.log
以我的例子來說
連不上, 為啥會連不上勒, telnet 192.168.11.103 3306 一下, 真的連不上 = =+
到103 查了一下 才發現上面的mysql server port 開在4040 囧
改好之後就連上了 = =+
還有有問題的時候可以在用show slave status 看一下, 像上面有問題的情況就會看到
Slave_IO_Running 是NO
Slave_IO_State 是Connecting to master
表示連到master 那邊有問題
如果正常的話
Slave_IO_Running 是yes
Slave_IO_State 是Waiting for master to send event
據High performance Mysql 書上的建議
別再把slave 連master 的 master_host, master_port 的設定放在my.cnf 裡, 可能會有問題而且沒有啥優點
請直接用CHANGE MASTER TO 指令去改
MASTER_LOG_POS=0 , 0不是指真正的log position 而是指at the start of log file
Waiting for master to send event 意思是 has fetched all of the master's binary logs
sync_binlog=1 , This makers MySQL synchronize the binary log's contents to disk each time it commits a transaction, so you dont lose log events if there's a crash
innodb_flush_logs_at_trx_commit=1 #Flush every log write
innodb_support_xa=1 #MySQL 5.0 and newer only
innodb_safe_binlog #MySQL 4.1 only
skip_slave_start will prevent the slave from starting automatically after a crash, which can give you a chance to repair a server if it has problems.
Even if you have enabled all the options we suggested, a slave can easily break after a crash, because the relay logs and master.info file are not crash-safe. They are not even flushed to disk, and there is no configuration option to control that behavior
2009年4月14日 星期二
Live Backups of MySQL Using Replication
Live Backups of MySQL Using Replication
當repliaction 運作起來時, 當一個sql 在master 執行時, MySQL將會把他們和log position紀錄在binary log (bin.log) 裡, 在slave 那邊會透過IO thread 讀取binary log 來找變動的地方, 如果找到變動的地方, 就會把它copy 到 relay.log裡, 然後它會紀錄新的位置(copy到哪)到master.info 裡, 這些都是在slave server 端
然後slave 會再次利用相同的IO thread 做check, 當slave server 偵測到自己relay log 的變動, 將會透過SQL thread 執行在relay log 裡的動作
作為一個安全守護者, slave 也會透過SQL thread 比對它自己和master 的資料, 如果資料不一致, replication 的動作變會停止而 錯誤訊息將會紀錄在error log 裡, 如果資料一致, 那copy 到哪之新的資料點將會紀錄在slave 的 relay-log.info 裡, 繼續偵測下次的變動
基於安全性的理由, 應該在另外建一個user 來做這件事 replication
這個指定了bin.log 要放在哪和檔名, 當然要確認目錄存在和 mysql 這個使用者有權限存取該目錄
當 server 重起或是log flush, 對於.log 結尾的file(也就是像上面那個), 會被自動取代成index number 的 file (ex: 000001) 當作新的file log name
像這邊log-bin-index 指定的這個file, log-bin.index 是用來紀錄現在用了哪些binary log, 當server restart 或者是log flushed 時, 這個同時 log file 改變的名字會記在這
log-error 用來指定error 要存的地方
對於master server binary log 變動做的relay log 是為了performance 考量
relay-log-info-file 便是指定要存slave 和 master 變動位置這種file 的name 和位置
relay log index 是用來紀錄現在有用來做replicating 的file name
如果想要只有root 能 access 的話, 我們可以改變max_connections 的值
雖然把值設成0 表示所有的連線都不允許, 只剩一個連線保留給root, 要看是不是還有其他的連線就用 SHOW PROCESSLIST, 要結束該連線就用KILL
要備份資料可以用mysqldump
extended-insert 參數將會create multiple-row INSERT 來讓備份快點
master-data 會lock 住整個table 不讓人家寫, 可是可以讀, 還有這個參數會加下面幾行在dump 出來file的結尾
當這個file 在 slave server 執行時, slave 將會執行該file 該位置後的SQL statement
START SLAVE;
After this statement is run, the slave will connect to the master and get the changes it missed since the backup. From there, it will stay current by continuously checking the binary log as outlined before.
STOP SLAVE;
slave server 知道停下來的位置, 所以我們可以在這個時候做slave server 的backup 動作, 完成之後在START SLAVE;
當repliaction 運作起來時, 當一個sql 在master 執行時, MySQL將會把他們和log position紀錄在binary log (bin.log) 裡, 在slave 那邊會透過IO thread 讀取binary log 來找變動的地方, 如果找到變動的地方, 就會把它copy 到 relay.log裡, 然後它會紀錄新的位置(copy到哪)到master.info 裡, 這些都是在slave server 端
然後slave 會再次利用相同的IO thread 做check, 當slave server 偵測到自己relay log 的變動, 將會透過SQL thread 執行在relay log 裡的動作
作為一個安全守護者, slave 也會透過SQL thread 比對它自己和master 的資料, 如果資料不一致, replication 的動作變會停止而 錯誤訊息將會紀錄在error log 裡, 如果資料一致, 那copy 到哪之新的資料點將會紀錄在slave 的 relay-log.info 裡, 繼續偵測下次的變動
基於安全性的理由, 應該在另外建一個user 來做這件事 replication
這個指定了bin.log 要放在哪和檔名, 當然要確認目錄存在和 mysql 這個使用者有權限存取該目錄
當 server 重起或是log flush, 對於.log 結尾的file(也就是像上面那個), 會被自動取代成index number 的 file (ex: 000001) 當作新的file log name
像這邊log-bin-index 指定的這個file, log-bin.index 是用來紀錄現在用了哪些binary log, 當server restart 或者是log flushed 時, 這個同時 log file 改變的名字會記在這
log-error 用來指定error 要存的地方
對於master server binary log 變動做的relay log 是為了performance 考量
relay-log-info-file 便是指定要存slave 和 master 變動位置這種file 的name 和位置
relay log index 是用來紀錄現在有用來做replicating 的file name
如果想要只有root 能 access 的話, 我們可以改變max_connections 的值
雖然把值設成0 表示所有的連線都不允許, 只剩一個連線保留給root, 要看是不是還有其他的連線就用 SHOW PROCESSLIST, 要結束該連線就用KILL
要備份資料可以用mysqldump
extended-insert 參數將會create multiple-row INSERT 來讓備份快點
master-data 會lock 住整個table 不讓人家寫, 可是可以讀, 還有這個參數會加下面幾行在dump 出來file的結尾
當這個file 在 slave server 執行時, slave 將會執行該file 該位置後的SQL statement
START SLAVE;
After this statement is run, the slave will connect to the master and get the changes it missed since the backup. From there, it will stay current by continuously checking the binary log as outlined before.
STOP SLAVE;
slave server 知道停下來的位置, 所以我們可以在這個時候做slave server 的backup 動作, 完成之後在START SLAVE;
2009年4月10日 星期五
Mysql replication memo(4)
16.3.4.5: What issues should I be aware of when setting up two-way replication?
MySQL replication currently does not support any locking protocol between master and slave to guarantee the atomicity of a distributed (cross-server) update. In other words, it is possible for client A to make an update to co-master 1, and in the meantime, before it propagates to co-master 2, client B could make an update to co-master 2 that makes the update of client A work differently than it did on co-master 1. Thus, when the update of client A makes it to co-master 2, it produces tables that are different from what you have on co-master 1, even after all the updates from co-master 2 have also propagated. This means that you should not chain two servers together in a two-way replication relationship unless you are sure that your updates can safely happen in any order, or unless you take care of mis-ordered updates somehow in the client code.
You should also realize that two-way replication actually does not improve performance very much (if at all) as far as updates are concerned. Each server must do the same number of updates, just as you would have a single server do. The only difference is that there is a little less lock contention, because the updates originating on another server are serialized in one slave thread. Even this benefit might be offset by network delays.
MySQL replication currently does not support any locking protocol between master and slave to guarantee the atomicity of a distributed (cross-server) update. In other words, it is possible for client A to make an update to co-master 1, and in the meantime, before it propagates to co-master 2, client B could make an update to co-master 2 that makes the update of client A work differently than it did on co-master 1. Thus, when the update of client A makes it to co-master 2, it produces tables that are different from what you have on co-master 1, even after all the updates from co-master 2 have also propagated. This means that you should not chain two servers together in a two-way replication relationship unless you are sure that your updates can safely happen in any order, or unless you take care of mis-ordered updates somehow in the client code.
You should also realize that two-way replication actually does not improve performance very much (if at all) as far as updates are concerned. Each server must do the same number of updates, just as you would have a single server do. The only difference is that there is a little less lock contention, because the updates originating on another server are serialized in one slave thread. Even this benefit might be offset by network delays.
Mysql replication memo(3)
16.1.1.3. Setting the Replication Slave Configuration
每個slave 也都必須有unique 的 server ID
The only option you must configure on the slave is to set the unique server ID. If this option is not already set, or the current value conflicts with the value that you have chosen for the master server, then you should shut down your slave server, and edit the configuration to specify the server ID.
一般而言 你不需要在slave, enable binary logging, 但是你還是可以開啟, 來幫助你做data backup 和 recovery 甚至可以用在更複雜的環境, 例如 slave 變成master 的情況
You do not have to enable binary logging on the slave for replication to be enabled. However, if you enable binary logging on the slave then you can use the binary log for data backups and crash recovery on the slave, and also use the slave as part of a more complex replication topology (for example, where the slave acts as a master to other slaves).
為了設定 replication 在 slave 上, 你比需要先知道master 現在的current point 是在 binary log 哪
To configure replication on the slave you must determine the master's current point within the master binary log. You will need this information so that when the slave starts the replication process, it is able to start processing events from the binary log at the correct point.
如果你已經有data 在master 上的話, 你必須先把master 停下來, 之後把資料都dump 到slave 上
If you have existing data on your master that you want to synchronize on your slaves before starting the replication process, then you must stop processing statements on the master, obtain the current position, and then dump the data, before allowing the master to continue executing statements. If you do not stop the execution of statements, the data dump and the master status information that you use will not match and you will end up with inconsistent or corrupted databases on the slaves.
16.1.1.8. Setting Up Replication with Existing Data
Once a slave is replicating, you can find in its data directory one file named master.info and another named relay-log.info. The slave uses these two files to keep track of how much of the master's binary log it has processed. Do not remove or edit these files unless you know exactly what you are doing and fully understand the implications. Even in that case, it is preferred that you use the CHANGE MASTER TO statement to change replication parameters. The slave will use the values specified in the statement to update the status files automatically.
每個slave 也都必須有unique 的 server ID
The only option you must configure on the slave is to set the unique server ID. If this option is not already set, or the current value conflicts with the value that you have chosen for the master server, then you should shut down your slave server, and edit the configuration to specify the server ID.
一般而言 你不需要在slave, enable binary logging, 但是你還是可以開啟, 來幫助你做data backup 和 recovery 甚至可以用在更複雜的環境, 例如 slave 變成master 的情況
You do not have to enable binary logging on the slave for replication to be enabled. However, if you enable binary logging on the slave then you can use the binary log for data backups and crash recovery on the slave, and also use the slave as part of a more complex replication topology (for example, where the slave acts as a master to other slaves).
為了設定 replication 在 slave 上, 你比需要先知道master 現在的current point 是在 binary log 哪
To configure replication on the slave you must determine the master's current point within the master binary log. You will need this information so that when the slave starts the replication process, it is able to start processing events from the binary log at the correct point.
如果你已經有data 在master 上的話, 你必須先把master 停下來, 之後把資料都dump 到slave 上
If you have existing data on your master that you want to synchronize on your slaves before starting the replication process, then you must stop processing statements on the master, obtain the current position, and then dump the data, before allowing the master to continue executing statements. If you do not stop the execution of statements, the data dump and the master status information that you use will not match and you will end up with inconsistent or corrupted databases on the slaves.
16.1.1.8. Setting Up Replication with Existing Data
Once a slave is replicating, you can find in its data directory one file named master.info and another named relay-log.info. The slave uses these two files to keep track of how much of the master's binary log it has processed. Do not remove or edit these files unless you know exactly what you are doing and fully understand the implications. Even in that case, it is preferred that you use the CHANGE MASTER TO statement to change replication parameters. The slave will use the values specified in the statement to update the status files automatically.
Mysql replication memo(2)
16.1. Replication Configuration
Mysql instance 在master 的設定下會把 update 和 change 的動作寫到 binary log 裡
The MySQL instance operating as the master (the source of the database changes) writes updates and changes as “events” to the binary log
在binary log 裡的資料會根據database OOXX存成不同的格式, slave 設定則會去拿master 的binary log 來放在local 讀
The information in the binary log is stored in different logging formats according to the database changes being recorded. Slaves are configured to read the binary log from the master and to execute the events in the binary log on the slave's local database
如果binary logging 的設定被enable 那麼所有的改變就會存在binary log 裡, 每個slave 都會完整個master binary log 備份, slave 會決定binary log裡的哪些動作要執行
The master is “dumb” in this scenario. Once binary logging has been enabled, all statements are recorded in the binary log. Each slave will receive a copy of the entire contents of the binary log. It is the responsibility of the slave to decide which statements in the binary log should be executed;
這樣的意思是說可以很多的slave 連 master , 執行相同binary log 的不同部份, 因為slave 可能會連不上slave 但是依然不會影響master 的操作, 因為slave 會去記得他跑到binary log 的位置, 所以slave 斷線之後, 可以從當初的位置繼續處理沒處理完的部份
This means that multiple slaves can be connected to the master and executing different parts of the same binary log. Because the slaves control this process, individual slaves can be connected and disconnected from the server without affecting the master's operation. Also, because each slave remembers the position within the binary log, it is possible for slaves to be disconnected, reconnect and then “catch up” by continuing from the recorded position.
master 和 slave 都必須設定一個unique ID (用 server-id 的參數), 此外slave 必須設定master host 的 name 和 log 的 name 和 位置, 這些細節可以透過CHANGE MASTER TO 做更改, 這些細節都存在master.info 裡
Both the master and each slave must be configured with a unique ID (using the server-id option). In addition, the slave must be configured with information about the master host name, log file name and position within that file. These details can be controlled from within a MySQL session using the CHANGE MASTER TO statement. The details are stored within the master.info file.
存在binary log 的Event 有不同的格式
Events in the binary log are recorded using a number of formats. These are referred to as statement-based replication (SBR) or row-based replication (RBR). A third type, mixed-format replication (MIXED), uses SBR or RBR replication automatically to take advantage of the benefits of both SBR and RBR formats when appropriate.
Mysql instance 在master 的設定下會把 update 和 change 的動作寫到 binary log 裡
The MySQL instance operating as the master (the source of the database changes) writes updates and changes as “events” to the binary log
在binary log 裡的資料會根據database OOXX存成不同的格式, slave 設定則會去拿master 的binary log 來放在local 讀
The information in the binary log is stored in different logging formats according to the database changes being recorded. Slaves are configured to read the binary log from the master and to execute the events in the binary log on the slave's local database
如果binary logging 的設定被enable 那麼所有的改變就會存在binary log 裡, 每個slave 都會完整個master binary log 備份, slave 會決定binary log裡的哪些動作要執行
The master is “dumb” in this scenario. Once binary logging has been enabled, all statements are recorded in the binary log. Each slave will receive a copy of the entire contents of the binary log. It is the responsibility of the slave to decide which statements in the binary log should be executed;
這樣的意思是說可以很多的slave 連 master , 執行相同binary log 的不同部份, 因為slave 可能會連不上slave 但是依然不會影響master 的操作, 因為slave 會去記得他跑到binary log 的位置, 所以slave 斷線之後, 可以從當初的位置繼續處理沒處理完的部份
This means that multiple slaves can be connected to the master and executing different parts of the same binary log. Because the slaves control this process, individual slaves can be connected and disconnected from the server without affecting the master's operation. Also, because each slave remembers the position within the binary log, it is possible for slaves to be disconnected, reconnect and then “catch up” by continuing from the recorded position.
master 和 slave 都必須設定一個unique ID (用 server-id 的參數), 此外slave 必須設定master host 的 name 和 log 的 name 和 位置, 這些細節可以透過CHANGE MASTER TO 做更改, 這些細節都存在master.info 裡
Both the master and each slave must be configured with a unique ID (using the server-id option). In addition, the slave must be configured with information about the master host name, log file name and position within that file. These details can be controlled from within a MySQL session using the CHANGE MASTER TO statement. The details are stored within the master.info file.
存在binary log 的Event 有不同的格式
Events in the binary log are recorded using a number of formats. These are referred to as statement-based replication (SBR) or row-based replication (RBR). A third type, mixed-format replication (MIXED), uses SBR or RBR replication automatically to take advantage of the benefits of both SBR and RBR formats when appropriate.
2009年4月9日 星期四
memcache memo(1)
Memcache FAQ
Memcached 透過兩種hash 來完成, 一個是透過對key 做 hash 來, 找所對應的 server, 第二個就是找其相對應的值了, 當然不同的client implement 這些過程也會有所不同
When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data.
For example, if we have clients 1, 2, 3, and servers A, B, C:
Client 1 wants to set key "foo" with value "barbaz". Client 1 takes the full list of servers (A, B, C), hashes the key against them, then lets say ends up picking server B. Client 1 then directly connects to server B, and sets key "foo" with value "barbaz". Next, client 2 wants to get key "foo". Client 2 runs the same client library as client 1, and has the same server list (A, B, C). It is able to use the same hashing process to figure out key "foo" is on server B. It then directly requests key "foo" and gets back "barbaz".
Can I use different size caches across servers and will memcached use the servers with more memory efficiently?
Memcache's hashing algorithm that determines which server a key is cached on does not take into account memory sizes across servers. But a workaround may be to run multiple memcached instances on your server with more memory with each instance using the same size cache as all your other servers.
Memcached is not faster than my database. Why?
In a one to one comparison, memcached may not be faster than your SQL queries. However, this is not its goal. Memached's goal is scalability. As connections and requests increase, memcached will perform better than most database only solutions. Please test your code under high load with simultaneous connections and requests before deciding memcached is not right for you.
Cache things other than SQL data!
When first plugging memcached into everything you can get your hands on, it may not be obvious that you can or should cache anything other than SQL resultsets. You can, and you should!
If you were building a profile page for display. You might fetch a user's bio section (name, birthdate, hometown, blurb). Then you might format the blurb to replace custom XML tags with HTML, or do some nasty regexes. Instead of caching 'name, birthdate, hometown, blurb' independently, or as one item, cache the renderred output chunk! Then you may simply fetch the pre-procsesed HTML chunk ready for inclusion in the rest of the page, saving precious CPU cycles.
Use a cache hierarchy
In most cases you have the ability to use a localized cache or memcached. We know to use memcached so we may enjoy a massive volume of cached data in a high speed farm, but sometimes it makes sense to go back to your roots a little and maintain multiple levels of cache.
Peter Zaitsev has written about the speed comparisons of PHP's APC over localhost, vs memcached over localhost, and the benefits of using both:
* http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
* http://www.mysqlperformanceblog.com/2006/09/27/apc-or-memcached/
Often you'll have a very small amount of data (product categories, connection information, server status variables, application config variables), which are accessed on nearly every page load. It makes a lot of sense to cache these as close to the process as possible (or even inside the process, if you can). It can help lower page render time, and increase reliability in case of memcached node failures.
Creating good keys
It's a good idea to use sprintf (), or a similar function, when creating keys. Otherwise, it's easy for null values and boolean values to slip into your keys and these may not work as you expect. e.g. memKey = sprintf ( 'cat:%u', categoryId );
WIP: Mulled this over, need someone with better examples to fill this in. Short keys tend to be good, using prefixes along with an MD5 or short SHA1 can be good, namespace prep is good. What else?
Is memcached atomic?
atomic operation
atomic 的意思是說 就算是某個 process 同時set/get 同個東西, 也會一前一後執行, 且不會讓其他的process 處理他
Of course! Well, lets be specific:
* All individual commands sent to memcached are absolutely atomic. If you send a set and a get in parallel, against the same object, they will not clober each other. They will be serialized and one will be executed before the other. Even in threaded mode, all commands are atomic. If they are not, it's a bug :)
* A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
Memcached 透過兩種hash 來完成, 一個是透過對key 做 hash 來, 找所對應的 server, 第二個就是找其相對應的值了, 當然不同的client implement 這些過程也會有所不同
When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data.
For example, if we have clients 1, 2, 3, and servers A, B, C:
Client 1 wants to set key "foo" with value "barbaz". Client 1 takes the full list of servers (A, B, C), hashes the key against them, then lets say ends up picking server B. Client 1 then directly connects to server B, and sets key "foo" with value "barbaz". Next, client 2 wants to get key "foo". Client 2 runs the same client library as client 1, and has the same server list (A, B, C). It is able to use the same hashing process to figure out key "foo" is on server B. It then directly requests key "foo" and gets back "barbaz".
Can I use different size caches across servers and will memcached use the servers with more memory efficiently?
Memcache's hashing algorithm that determines which server a key is cached on does not take into account memory sizes across servers. But a workaround may be to run multiple memcached instances on your server with more memory with each instance using the same size cache as all your other servers.
Memcached is not faster than my database. Why?
In a one to one comparison, memcached may not be faster than your SQL queries. However, this is not its goal. Memached's goal is scalability. As connections and requests increase, memcached will perform better than most database only solutions. Please test your code under high load with simultaneous connections and requests before deciding memcached is not right for you.
Cache things other than SQL data!
When first plugging memcached into everything you can get your hands on, it may not be obvious that you can or should cache anything other than SQL resultsets. You can, and you should!
If you were building a profile page for display. You might fetch a user's bio section (name, birthdate, hometown, blurb). Then you might format the blurb to replace custom XML tags with HTML, or do some nasty regexes. Instead of caching 'name, birthdate, hometown, blurb' independently, or as one item, cache the renderred output chunk! Then you may simply fetch the pre-procsesed HTML chunk ready for inclusion in the rest of the page, saving precious CPU cycles.
Use a cache hierarchy
In most cases you have the ability to use a localized cache or memcached. We know to use memcached so we may enjoy a massive volume of cached data in a high speed farm, but sometimes it makes sense to go back to your roots a little and maintain multiple levels of cache.
Peter Zaitsev has written about the speed comparisons of PHP's APC over localhost, vs memcached over localhost, and the benefits of using both:
* http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
* http://www.mysqlperformanceblog.com/2006/09/27/apc-or-memcached/
Often you'll have a very small amount of data (product categories, connection information, server status variables, application config variables), which are accessed on nearly every page load. It makes a lot of sense to cache these as close to the process as possible (or even inside the process, if you can). It can help lower page render time, and increase reliability in case of memcached node failures.
Creating good keys
It's a good idea to use sprintf (), or a similar function, when creating keys. Otherwise, it's easy for null values and boolean values to slip into your keys and these may not work as you expect. e.g. memKey = sprintf ( 'cat:%u', categoryId );
WIP: Mulled this over, need someone with better examples to fill this in. Short keys tend to be good, using prefixes along with an MD5 or short SHA1 can be good, namespace prep is good. What else?
Is memcached atomic?
atomic operation
atomic 的意思是說 就算是某個 process 同時set/get 同個東西, 也會一前一後執行, 且不會讓其他的process 處理他
Of course! Well, lets be specific:
* All individual commands sent to memcached are absolutely atomic. If you send a set and a get in parallel, against the same object, they will not clober each other. They will be serialized and one will be executed before the other. Even in threaded mode, all commands are atomic. If they are not, it's a bug :)
* A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
2009年4月8日 星期三
memcache , sphinx utf8 memo
memcache FAQ
節錄一下
去增加或移除你的server 設定是很危險的, 因為這樣你的hash 也會變
memcache object 會對塞進它的key 做hash 來決定這個key mapping 到哪個memcache server, 就算是memcache server 掛掉, 也會rehash 寫到另外的server 所以你只要用memcache object 來set/get , 就算是100台memcache server 也是一樣, 它會自己去找到它應該去找去砍的memcache server
default clients adding or removing servers will invalidate all of your cache! Since the list of servers to hash against has changed, most of your keys will likely hash to different servers. It's like restarting all of your nodes at the same time.
Use the old default "key rehashing" mode. If a memcache client detects a down server, the previous default behavior was to pick another server and set/fetch the data from there. This can be dangerous if you have a server that "flaps" up and down, as you can end up with stale data on broken and working servers.
Ultrasphinx 搜尋不到中文的解決方法
config/database.yml 加一行 encoding: utf8
節錄一下
去增加或移除你的server 設定是很危險的, 因為這樣你的hash 也會變
memcache object 會對塞進它的key 做hash 來決定這個key mapping 到哪個memcache server, 就算是memcache server 掛掉, 也會rehash 寫到另外的server 所以你只要用memcache object 來set/get , 就算是100台memcache server 也是一樣, 它會自己去找到它應該去找去砍的memcache server
default clients adding or removing servers will invalidate all of your cache! Since the list of servers to hash against has changed, most of your keys will likely hash to different servers. It's like restarting all of your nodes at the same time.
Use the old default "key rehashing" mode. If a memcache client detects a down server, the previous default behavior was to pick another server and set/fetch the data from there. This can be dangerous if you have a server that "flaps" up and down, as you can end up with stale data on broken and working servers.
Ultrasphinx 搜尋不到中文的解決方法
config/database.yml 加一行 encoding: utf8
2009年4月7日 星期二
Mysql replication memo(1)
Chapter 16. Replication
Replication 讓資料能夠 複製到多個其他的database server 上, replication 基本上是不同步的, slave 不需要永遠都連著master 來接收變動, 意思是說這樣的機制可以長距離傳輸, 即使是 dial-up service 也行, 根據你的設定你可以複製所有的databases 或者是某些database 甚至是某些tables
Replication 讓資料能夠 複製到多個其他的database server 上, replication 基本上是不同步的, slave 不需要永遠都連著master 來接收變動, 意思是說這樣的機制可以長距離傳輸, 即使是 dial-up service 也行, 根據你的設定你可以複製所有的databases 或者是某些database 甚至是某些tables
mysql replication, 同步 database
MYSQL的master/slave資料同步配置
How to Set Up Replication
Mysql Replication 如何設定覆寫機制
這邊設定的replication, 只有單向, 兩個database, 一個是master 一個是slave
master 資料變動的時候, slave 會跟著變, 反之則否
Step1:
(In master)
在master 上開一個有 replication slave 權限的user 叫 slave_user, password 是 slave_password, 而且只有從 192.168.11.103 來的能用
Step2 :
(In master)
設定/etc/mysql/my.cnf
Step3
(In slave)
設定 my.cnf
Step4
(In master and slave)
在同步之前, 要先把master 已經有的資料弄到slave 上
Step5:
(In master and slave) master 和 slave 兩邊的mysql 都要重起
/etc/init.d/mysql restart
How to Set Up Replication
Mysql Replication 如何設定覆寫機制
這邊設定的replication, 只有單向, 兩個database, 一個是master 一個是slave
master 資料變動的時候, slave 會跟著變, 反之則否
Step1:
(In master)
在master 上開一個有 replication slave 權限的user 叫 slave_user, password 是 slave_password, 而且只有從 192.168.11.103 來的能用
Step2 :
(In master)
設定/etc/mysql/my.cnf
Step3
(In slave)
設定 my.cnf
Step4
(In master and slave)
在同步之前, 要先把master 已經有的資料弄到slave 上
Step5:
(In master and slave) master 和 slave 兩邊的mysql 都要重起
/etc/init.d/mysql restart
javascript & document.write
[問題]javascript中的document.write
document.write 表示你要重寫整個頁面,所以第一個指令執行後,整個頁面已予清除,第二個指令也就不見了
另外,JS 語法也不正確,應該是
document.open();
document.write(...);
:
document.close();
如果不要清除網頁,可用:
text = document.createTextNode("blah blah blah");
document.body.appendChild(text);
document.write 表示你要重寫整個頁面,所以第一個指令執行後,整個頁面已予清除,第二個指令也就不見了
另外,JS 語法也不正確,應該是
document.open();
document.write(...);
:
document.close();
如果不要清除網頁,可用:
text = document.createTextNode("blah blah blah");
document.body.appendChild(text);
2009年4月4日 星期六
ym4r memo(3)
GMarkerManager
#有時候你要秀上千個marker的時候, 你可以利用這個來指定在哪個zoom 的範圍下, 才會有
It is a recent (v2.67) GMaps API class that manages the efficient display of potentially thousands of markers. It is similar to the Clusterer (see below) since markers start appearing at specified zoom levels. The clustering behaviour has to be managed explicitly though by specifying the cluster for smaller zoom levels and specify the expanded cluster for larger zoom levels and so on. Note that it is not an overlay and is not added to the map through an overlay_init call.
===Clusterer
也是用在markers 很多的情況下, 讓它小到在只有某個數目的markers的時候才會出現
A Clusterer is a type of overlay that contains a potentially very large group of markers. It is engineered so markers close to each other and undistinguishable from each other at some level of zoom, appear as only one marker on the map at that level of zoom. Therefore it is possible to have a very large number of markers on the map at the same time and not having the map crawl to a halt in order to display them.
===GeoRss Overlay
從rss 拿 位置
An group of markers taken from a Rss feed containing location information in the W3C basic geo (WGS83 lat/lon) vocabulary and in the Simple GeoRss format
===Adding new map types
沒看 XD
It is now possible to easily add new map types, on top of the already existing ones, like G_SATELLITE_MAP or G_NORMAL_MAP. The imagery for these new map types can come from layers of the standard map types or can be taken either from a WMS server or from pretiled images on a server (that can be generated with a tool that comes with the YM4R gem: refer to the README of the gem to know more about it).
===Local Search
試不出來 @@
Local Search control has been added to the map and control objects. There are two places you need to implement it to get it to work. This is because the local search control needs an additional library added to the import as well as the control added to the map.
#有時候你要秀上千個marker的時候, 你可以利用這個來指定在哪個zoom 的範圍下, 才會有
It is a recent (v2.67) GMaps API class that manages the efficient display of potentially thousands of markers. It is similar to the Clusterer (see below) since markers start appearing at specified zoom levels. The clustering behaviour has to be managed explicitly though by specifying the cluster for smaller zoom levels and specify the expanded cluster for larger zoom levels and so on. Note that it is not an overlay and is not added to the map through an overlay_init call.
===Clusterer
也是用在markers 很多的情況下, 讓它小到在只有某個數目的markers的時候才會出現
A Clusterer is a type of overlay that contains a potentially very large group of markers. It is engineered so markers close to each other and undistinguishable from each other at some level of zoom, appear as only one marker on the map at that level of zoom. Therefore it is possible to have a very large number of markers on the map at the same time and not having the map crawl to a halt in order to display them.
===GeoRss Overlay
從rss 拿 位置
An group of markers taken from a Rss feed containing location information in the W3C basic geo (WGS83 lat/lon) vocabulary and in the Simple GeoRss format
===Adding new map types
沒看 XD
It is now possible to easily add new map types, on top of the already existing ones, like G_SATELLITE_MAP or G_NORMAL_MAP. The imagery for these new map types can come from layers of the standard map types or can be taken either from a WMS server or from pretiled images on a server (that can be generated with a tool that comes with the YM4R gem: refer to the README of the gem to know more about it).
===Local Search
試不出來 @@
Local Search control has been added to the map and control objects. There are two places you need to implement it to get it to work. This is because the local search control needs an additional library added to the import as well as the control added to the map.
ym4r memo(2)
ym4r in github
GMarkerGroup
To use it you would have to include in your HTML template the JavaScript file markerGroup.js after the call
to GMap.header (because it extends the GOverlay class). You should have something like that in your
template:
<%= javascript_include_tag("markerGroup") %>
It is useful in 2 situations:
- Display and undisplay a group of markers without referencing all of them. You just declare your marker group globally and call +activate+ and +deactivate+ on this group in response, for example, to clicks on links on your page. - Keeping an index of markers, for example, in order to show one of these markers in reponse to a click on a link (the way Google Local does with the results of the search).
還有其他function 都在 public/javascript 的markerGroup.js 裡面還有更多其他有用的function
You can call +activate+ and +deactivate+ to display or undisplay a group of markers. You can add new markers with addMarker(marker,id). Again if you don't care about referencing the marker, you don't need to pass an id. If the marker group is active, the newly added marker will be displayed immediately. Otherwise it will be displayed the next time the group is activated. Finally, since it is an overlay, the group will be removed when calling clearOverlays on the GMap object.
You can center and zoom on all the markers in the group by calling GMarkerGroup#centerAndZoomOnMarkers() after the group has been added to a map. So for example, if you would want to do that at initalization time, you would do the following assuming your marker group has been declared as +group+): @map.record_init group.center_and_zoom_on_markers
GMarkerGroup
To use it you would have to include in your HTML template the JavaScript file markerGroup.js after the call
to GMap.header (because it extends the GOverlay class). You should have something like that in your
template:
<%= javascript_include_tag("markerGroup") %>
It is useful in 2 situations:
- Display and undisplay a group of markers without referencing all of them. You just declare your marker group globally and call +activate+ and +deactivate+ on this group in response, for example, to clicks on links on your page. - Keeping an index of markers, for example, in order to show one of these markers in reponse to a click on a link (the way Google Local does with the results of the search).
還有其他function 都在 public/javascript 的markerGroup.js 裡面還有更多其他有用的function
You can call +activate+ and +deactivate+ to display or undisplay a group of markers. You can add new markers with addMarker(marker,id). Again if you don't care about referencing the marker, you don't need to pass an id. If the marker group is active, the newly added marker will be displayed immediately. Otherwise it will be displayed the next time the group is activated. Finally, since it is an overlay, the group will be removed when calling clearOverlays on the GMap object.
You can center and zoom on all the markers in the group by calling GMarkerGroup#centerAndZoomOnMarkers() after the group has been added to a map. So for example, if you would want to do that at initalization time, you would do the following assuming your marker group has been declared as +group+): @map.record_init group.center_and_zoom_on_markers
2009年4月3日 星期五
ym4r memo (1)
ym4r in github
1. 如果你的website 會用到不同的domain 時, 就需要在設定裡面特別去指定各個host 的key, 然後在GMap.header 裡, 加上:host 的參數, 如
you may need to have multiple keys
(for example if your app will be accessible from multiple URLs, for which you need different keys), you can also associate a hash to the environment, whose keys will be the different hosts. In this case, you will need to pass a value to the :host key when calling the method GMap.header(usually @request.host.
2. 也可以一次用兩張地圖, 不過你這時你就必須特別指定global variable
另一個在controller 必要的設定是設定地圖中心和 zoom
3. to_html 可以傳 :full => true 來讓它變全螢幕
4. GMarker 可以吃很多種參數
You can pass options to the GMarker to customize the info window (:info_window or :info_window_tabs options), the tooltip (:title option) or the icon used (:icon option).
For example:
5. GPolyline
GPolylines are colored lines on the map. The constructor takes as argument a list of GLatLng or a list of 2-element
arrays, which will be transformed into GLatLng for you. It can also take the color (in the #rrggbb form), the
weight (an integer) and the opacity. These arguments are optional though.
For example:
6. GPolygon
The constructor takes as argument a list of GLatLng or a list of 2-element arrays, which will be transformed into GLatLng for you. Note that for polygons, the last point must be equal to the first, in order to have a closed loop. It can also take the color (in the #rrggbb form) of the stroke, the weight of the stroke, the opacity of the stroke, as well as the color of the fill and the opacity. These arguments are optional though.
For example:
1. 如果你的website 會用到不同的domain 時, 就需要在設定裡面特別去指定各個host 的key, 然後在GMap.header 裡, 加上:host 的參數, 如
you may need to have multiple keys
(for example if your app will be accessible from multiple URLs, for which you need different keys), you can also associate a hash to the environment, whose keys will be the different hosts. In this case, you will need to pass a value to the :host key when calling the method GMap.header(usually @request.host.
2. 也可以一次用兩張地圖, 不過你這時你就必須特別指定global variable
另一個在controller 必要的設定是設定地圖中心和 zoom
3. to_html 可以傳 :full => true 來讓它變全螢幕
4. GMarker 可以吃很多種參數
You can pass options to the GMarker to customize the info window (:info_window or :info_window_tabs options), the tooltip (:title option) or the icon used (:icon option).
For example:
5. GPolyline
GPolylines are colored lines on the map. The constructor takes as argument a list of GLatLng or a list of 2-element
arrays, which will be transformed into GLatLng for you. It can also take the color (in the #rrggbb form), the
weight (an integer) and the opacity. These arguments are optional though.
For example:
6. GPolygon
The constructor takes as argument a list of GLatLng or a list of 2-element arrays, which will be transformed into GLatLng for you. Note that for polygons, the last point must be equal to the first, in order to have a closed loop. It can also take the color (in the #rrggbb form) of the stroke, the weight of the stroke, the opacity of the stroke, as well as the color of the fill and the opacity. These arguments are optional though.
For example:
2009年4月2日 星期四
How to use ym4r drag and move marker in google map
Esa's Google Maps API examples
Integrating Google Maps into Your Web Applications
google map api
[Ym4r-user] marker event listeners
A port of the YM4R-GM plugin for rails
If we want to drag or move a marker, we should set the global variable for marker when we create a marker or this marker would be anonymous one
application.js
這樣會生出這樣的script
下面就是如果沒有@map.overlay_global_init(@gmarker, "gmarker"), 沒有給marker 指定變數的結果
Integrating Google Maps into Your Web Applications
google map api
[Ym4r-user] marker event listeners
A port of the YM4R-GM plugin for rails
If we want to drag or move a marker, we should set the global variable for marker when we create a marker or this marker would be anonymous one
application.js
這樣會生出這樣的script
下面就是如果沒有@map.overlay_global_init(@gmarker, "gmarker"), 沒有給marker 指定變數的結果
訂閱:
文章 (Atom)