Archive for the '技术-高性能服务器' Category


用压缩工具来提高web应用程序效率

    在网站性能优化时候,Yahoo Performance Team的rules for high performance web sites是较好的调优指南(或者yahoo的大牛Steve Souders在其《High Performance Web Sites》书中提出提高网页效率的14条准则),Firefox的插件Firebug及Yahoo提供的Firebug 插件Yslow(http://developer.yahoo.com/yslow/)也是进行性能调优较好的工具。

    按照调优指南,一个重要的原则就是对js、css等文件采用Gzip、deflate等压缩工具进行压缩,以降低网络传输带宽。

1、网页压缩原理

    网页压缩是一项由WEB服务器(应用服务器)和浏览器之间共同遵守的协议,也即WEB服务器(应用服务器)和浏览器都必须支持该技术,现在流行的浏览器(IE、FireFox、Opera )都是支持的;Lighttpd、Apache 、Ngix、 IIS、Tomcat等应用服务器或web服务器都支持。双方的协商过程如下:

  1. 浏览器请求某URL,并在请求的头 (head) 中设置属性 accept-encoding 值为 gzip, deflate,表明浏览器支持 gzip 和 deflate 这两种压缩方式(事实上 deflate 也是使用 gzip 压缩协议);
  2. WEB 服务器接收到请求后判断浏览器是否支持压缩,如果支持就传送压缩后的响应内容,否则传送不经过压缩的内容;
  3. 浏览器获取响应内容后,判断内容是否被压缩,如果是则解压缩,然后显示响应页面的内容。

    具体的交互过程可以利用Livehttpheader来查看http的交互过程。

2、网页压缩的方法

对JS、CSS文件压缩分为两大方面:

  • 对js、CSS采用yuicompressor、JsPacker这样的工具对js等文件进行压缩。主要是删除诸如空行、回车换行、注释等无用的字符,减少文件本身的大小,这也是诸如jquery、prototype等javascript库发布时候所采用的方法。
  • 采用应用服务器(web服务器)及浏览器对gzip、deflate等压缩方法的支持来对请求进行实时的压缩

    在实际应用时候,应当结合两种方法来使用。在大型应用中,会采用lighttpd、apache这样的web服务器来做前端,可以在lighttpd或apache中配置gzip、deflate支持。这里只是简单说明一下采用Jboss对gzip的支持来提高web应用程序的效率

3、修改jboss配置,让其支持gzip压缩

    这里使用jboss 4.2.2,servlet容器采用的是tomcat,因此让jboss支持gzip的配置方法,实际上就是修改tomcat的配置

    修改jboss-4.2.2.GA/server/default/deploy/jboss-web.deployer/server.xml,增加如下内容

    <Connector port=”80″ address=”0.0.0.0″

         maxThreads=”1500″ maxHttpHeaderSize=”8192″

         emptySessionPath=”true” protocol=”HTTP/1.1″

         enableLookups=”false” redirectPort=”8443″ acceptCount=”100″

         connectionTimeout=”20000″ disableUploadTimeout=”true”

         compression=”on”      

         compressableMimeType=”text/html,text/xml,text/plain,text/css,text/javascript,application/xhtml+xml,

         application/x-javascript,application/javascript,text/xhtml”

        />

    在Tomcat与gzip相关的几个参数如下,具体可以参考

http://www.jboss.org/file-access/default/members/jbossweb/freezone/docs/2.1.0/config/printer/http.html

或:

http://tomcat.apache.org/tomcat-5.5-doc/config/http.html

  • compressableMimeType

    The value is a comma separated list of MIME types for which HTTP compression may be used. The default value is text/html,text/xml,text/plain.

  • compression

    The Connector may use HTTP/1.1 GZIP compression in an attempt to save server bandwidth. The acceptable values for the parameter is “off” (disable compression), “on” (allow compression, which causes text data to be compressed), “force” (forces compression in all cases), or a numerical integer value (which is equivalent to “on”, but specifies the minimum amount of data before the output is compressed). If the content-length is not known and compression is set to “on” or more aggressive, the output will also be compressed. If not specified, this attribute is set to “off”.

  • noCompressionUserAgents

  The value is a comma separated list of regular expressions matching user-agents of HTTP clients for which compression should not be used, because these clients, although they do advertise support for the feature, have a broken implementation. The default value is an empty String (regexp matching disabled).

采用gzip压缩后,一个典型例子为struts2中所采用的dojo.js,原来大小为258K,压缩后只有72K,压缩的效果还是很显著的

4、调试

    在web开发时候firefox提供了比IE等浏览器更好的web开发工具

    Firebug:http://www.getfirebug.com/

    Yslow:http://developer.yahoo.com/yslow

    Livehttpheader:http://livehttpheaders.mozdev.org/

    Web Developer: http://chrispederick.com/work/web-developer/

4.1、采用Yslow查看对网站的建议

yslow

3.2、采用Firebug参看页面交互情况

firebug

另外今天是Firefox3正式发布的日子,下载支持一下,感谢Firefox提供了如此好的工具。

无线增值业务门户建设技术思考

 

    eSales这样的业务运营、支撑系统由于大部分内容都是动态的且由于并发用户数相对较少、压力也相对较低,在设计合理的情况下,性能并不是最大的瓶颈,因此此种情况下采用动态页面的方式是比较恰当。对于门户社区而言,高并发、高负载、高性能、高可用性是第一位的,需要采用各种手段来提高其性能。关于网站优化最好的方法论是Yahoo 的Best Practices for Speeding Up Your Web Site ,技术层面细节的优化策略参看Yahoo的方法论。

    同时这是一个用户为中心(user centered)的年代,诸如“以用户为中心的设计”、“以用户为中心的系统”、“用户为中心的营销”等等。 但是怎样才能够让门户设计中充分考虑用户体验,避免致命性的坏体验(bad smell)?这是门户建设需要重点考虑的问题,这一点上所谓的交互设计模式对于我们还是有所益处的。关于用户交互设计的模式:Yahoo Design Pattern LibraryInteraction Design Pattern Library

    此处重点从技术层面谈一下在门户开发时候需要考虑的重点内容:

    Web2.0化:除了充分使用诸如TAG、RSS、DIGG、SNS、UGC这些典型的Web2.0的元素外,“用户体验”是门户建设的重中之重,以用户体验为中心,把这些web2.0元素恰当地融入无线增值业务门户中,相信我们才会造就一个伟大的无线互联网门户,否则只是一堆与别人雷同的舶来品。

    无线门户、互联网门户一体化:WAP门户及互联网门户采用同样的技术架构,在整个系统的基础架构仍然沿用目前的Struts2+Spring+Hibernate(ibatis)的架构,但在View层不使用JSP,而是采用Freemarker,充分利用Freemarker对模板支持及对xml较好支持,将对WML(WAP1.0)、XHTML(WAP2.0)、HTML(Internet)的处理都统一到同一架构下。

    REST(Representational State Transfer):遵循REST设计原则,尤其是无状态通信(statelessness)。

    页面静态化:对门户社区页面都尽量采用页面静态化方案,这样能够充分利用cache机制及实现replication、load balance及镜像(例如南北电信部署)。为了实现页面静态化策略,采用Freemarker+FMPP方案来实现页面静态化的策略。

    SEO:在设计时候一定要首先重点考虑搜索引擎友好及针对google、baidu搜索引擎进行优化,主要是Meta Tag部分内容及网站架构,所有的页面url遵循RSET模式,对无法遵循REST模式的,采用lighhttpd的mod_rewrite来实现。

    爬虫:简单的垂直爬虫主要采用httpclient+htmlparser方案实现,复杂爬虫策略采用Heritrix(或Nutch)实现。

    搜索引擎:采用Nutch+Lucene+Compass方案,对门户定时索引,提供全站搜索功能。

    AJAX:在eSales后台的ajax主要采用struts2的dojo实现,可以充分利用struts2的标签,保持架构的统一。在门户实现时候,由于主要采用静态页面化方案,对于需要动态内容的地方,采用ajax来实现动态数据的状态。在ajax库选择上,不再采用dojo,采用jquery方案。

    CSS:在eSales后台主要还是采用frame、table方式来实现页面布局,在门户开发时候完全采用CSS方案,以保证页面布局的灵活性及页面大小。

    Cache:尽量使用诸如memcache和squid的cache机制,提高性能

     镜像采用rsync来实现对静态页面内容的镜像及同步,解决因不同运营商(移动(铁通)、电信、联通(网通)、教育网、有线网)及地域用户访问速度上差异。

    其他的部署策略参看下图:

    平台系统部署方案

 

参考资料:

  • 性能方面:

http://developer.yahoo.com/performance/rules.html

http://highscalability.com/

High Performance Web Sites

Building Scalable Web Sites

http://www.sitepoint.com/print/web-site-optimization-steps

 

  • 用户体验方面

http://www.welie.com/patterns/index.php

http://developer.yahoo.com/ypatterns/atoz.php

http://en.wikipedia.org/wiki/Interaction_design_pattern

http://www.visi.com/~snowfall/InteractionPatterns.html

mysql 数据库cpu 占用99.9%问题调优札记

  新公司的系统一直很不稳定,店面销售人员经常报登不上系统或速度奇慢的情况,怀疑可能是代码存在数据库连接泄露及内存泄露现象,离春节只有几天时间,也来不急进行代码调优,只有从配置层面看有那些手段来采用,以便暂且缓解一下服务器压力,降低系统的故障率。为了第一时间能够知道服务器故障,基于nagios搭建了服务器监控程序,这样系统有故障时候,能够用短信方式通知系统故障,及时解决。

1、系统情况:

  操作系统:Redhat AS4

  数据库:mysql 4.1.18

  应用服务器:JBoss 3.2.7

  服务器: 4 x3.00GHz的Intel Xeon CPU

  数据库和应用服务器都部署在同一台服务器上。

  简单跟踪了一下,发现平常内存、io负载都不大,数据库连接数也不多。只是很奇怪的是mysql的cpu负载始终是99.9%,但整个系统的速度还行,开始怀疑是JVM、数据库参数、索引没有优化导致的,因此先着手对java虚拟机参数及数据库参数进行了调整。

2、java虚拟机调优

  • 调整虚拟机的参数

  JAVA_OPTS=”$JAVA_OPTS -Xms512m -Xmx1024m -server -XX:MaxPermSize=300m -XX:MaxNewSize=300m”

  • 调整jboss的数据库连接池,修改最大连接数及连接回收时间

<min-pool-size>20</min-pool-size>

<max-pool-size>300</max-pool-size>

<idle-timeout-minutes>1</idle-timeout-minutes>

<min-pool-size>20</min-pool-size>

3、数据库调优

  • 对所有的表,优化及增加索引。

发现一个好用的mysql工具navicat,感觉比ems 好用,用这东西增加索引方便多了。

  • 调整mysql参数

原来是基于my-medium.cnf 修改的参数,由于担心是大数据量查询sort区等不够及程序存在内存泄露问题,因此基于my-huge.cnf进行调整。

[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 32M
thread_concurrency = 8
max_connections=300

#skip-networking

#log-bin

server-id       = 1

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
#safe-updates

[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

调整后,支撑了3天左右,除了mysql的cpu占用始终是99%外,系统整体运行基本正常,忙于其他事情,没有继续跟踪。没想到大年初一接了一堆报警短信,执行查看了系统参数,发现系统竟然没有swap区,欣喜一阵,可能是这原因吧,于是临时建立swap区。

4、增加swap区

  • 在/swap下生成1G的文件

     # mkdir /swap

  # dd if=/dev/zero of=/swap/swapfile bs=500M count=2

  • 创建为swap文件

  #mkswap /swap

  • 让swap生效

  #swapon /swap

  • 查看一下swap

  #swapon -s

  • 把新增的swap文件加到fstab文件中让系统引导时自动启动

  #vi /etc/fstab

  /swap/swapfile swap swap defaults 0 0

    增加后,重启应用及服务,mysql的cpu占用还是持续性为99.9%,而且运行上一段时间还是出现无法登录的情况。远程登录到系统,发现内存、io、swap区占用都很正常,数据库连接数也很正常,而且在停止mysql和jboss后,直接重启jboss,不能正常启动成功,需要等上一会儿,怀疑是文件句柄及tcp连接尚未正常释放。联系以前遇到的情况,怀疑与操作系统允许的最大句柄数有关。

用ulimit -a|grep open 命令查看了结果为:

open files                      (-n) 1024

用cat /proc/sys/fs/file-max查看结果为:

379816

由于数据库和jboss同时部署在同一台服务器上,在负荷较小的情况下用lsof -u root |wc -l查看root用户的句柄数仍然为700多,因此在负荷较高的情况下,用户的最大句柄数1024是有点小。

5、修改操作系统句柄数

5.1、修改操作系统的最大限制数

  • 修改 /etc/sysctl.conf

    增加fs.file-max = 8061540

  • 在/etc/pam.d/login 中添加  

    session     required      /lib/security/pam_limits.so

  • 在/etc/security/limits.conf 中添加
    root              -        nofile           1006154

  修改root用户的句柄数(包括hard和soft)限制为1006154

  • 修改 /etc/rc.local   添加
    echo 8061540 > /proc/sys/fs/file-max

 

5.2、修改用户最大限制数

考虑到重启服务器的风险,先暂时修改一下启动jboss的root用户的/root/.bash_profile,增加如下内容:

ulimit -n 65535

重启jboss和mysql。

连续观察了几天,发现cpu始终占用99.9%的情况解决掉了,继续观察中。

 

6、参考文档

http://www.bea.com.cn/support_pattern/Too_Many_Open_Files_Pattern.html

http://kbase.redhat.com/faq/FAQ_80_1540.shtm

ebay 电子商务平台研究(2)-ebay architecture

4、ebay系统架构

4.1、架构度量标准

搭建高扩展性的系统架构是每一个架构师的口号,又有几个实现了这样的承诺?什么叫架构的高扩展性,我自己一直也没有明确的答案,ebay的架构师Dan Pritchett在其blog上的一篇文章You Scaled Your What?,对架构的高扩展性的维度要素进行了较为精辟的阐述:

  • Transactional
  • Data
  • Operational
  • Deployability
  • Productivity
  • Feature TTM

尤其值得注意的是他把系统的运营性、易部署作为架构扩展的重要指标提出,很好的实践性经验总结。对于大部分的互联网公司而言,“以软件作为服务”,运营效率、运营成本是核心的竞争力之一;在需求变更频繁、迭代时间较短、需要部署服务器众多的情况下,怎样在不影响生产系统业务运行的前提下,实现代码快速、安全的上线部署,直接影响能否及时响应需求变更和服务质量。因此在架构搭建、程序开发过程中一定要考虑系统的可维护性、可运营性以及部署上线的要求。

4.2、架构目标

高可用性、高可靠性、高扩展性、高安全性:支撑系统无缝的增长,保证大容量数据库和代码的扩展性

高可维护性,更快的产品交付:以加速度交付高质量的功能,更进一步精简和优化ebay的开发模型

为未来而架构:支撑10倍速的增长,支撑快速的业务革新

 

 

4.3、架构

image

 image

 

 

4.4、软件设计模式实践

http://au.sun.com/events/dev_forum/files/best_practices.pdf

http://gceclub.sun.com.cn/java_one_online/2003/TS-3264CHI(USA,2003)/ts3264ch.pdf

http://blog.spiralarm.com/richard/2006/12/billion-hits-a-day-ebay-javaone.pdf

image

image

 

5、架构最佳实践

• Scale Out, Not Up
– Horizontal scaling at every tier.
– Functional decomposition.
• Prefer Asynchronous Integration
– Minimize availability coupling.
– Improve scaling options.
• Virtualize Components
– Reduce physical dependencies.
– Improve deployment flexibility.
• Design for Failure
– Automated failure detection and notification.
– “Limp mode” operation of business feature

 

5.1、Data Tier(数据层)

5.1.1、功能分段(Functional Segmentation)

  通过对数据库数据按照功能进行分段(应该就是按照use case的领域模型或实体模型),可以将原来存储到一台数据库服务器的数据按照功能分布到不同数据库服务器上。例如User数据库、Item数据库、Account数据库等,ebay有多达70多种功能分类。 功能分段可以支持功能间的解耦和彼此独立性,在分段时候需要根据功能使用频率、扩展性等特征对不同数据进行分段。一个典型的例子就是应当把OLTP和OLAP的功能分段,分别在不同的服务器上进行处理。

  值得注意的是:ebay在应用服务器、数据库数据分区上是基于use case进行划分的,很好的度量标准,只是在use case的粒度定义上需要经验和技巧。

5.1.2、水平切割(Horizontal Split)

  按照所谓的主要存取路径“primary access path”进行水平切割。在模式上有多种。例如:写操作读取master数据库,读操作读取slave库;或者按照数据分段访问(按key、按Map to data location)

5.1.3、逻辑数据库主机(Logical Database Hosts)

    逻辑数据库主机应该是类似数据库中间件或统一的数据存取层以隔离数据的实际物理存储库。在自己实现时候最为简单的方案就是在应用层面各Use Case的DAO层应当能够使用不同的数据源(多个数据库),而不是只能使用统一的数据源配置信息。

5.1.4、降低数据库资源占用(Minimize DB Resources)

   不在数据库服务器处理具有业务逻辑的操作:无存储过程,只采用简单的触发器

   将CPU占用较大的操作移到应用程序来处理:包括对依赖完整性、Join、排序等操作都放到应用程序来处理。这也对,毕竟大部分应用数据库是瓶颈,而且应用服务器还是比数据库服务器配置低,成本也低。

   大量使用prepared statements 和绑定变量(bind variables)

5.1.5、减少数据库事务处理(Minimize DB Transactions)

  避免死锁情况,降低耦合性,并发更新,无缝处理切割数据的访问 

  对于大部分的数据库操作采用Auto Commit方式。

  完全没有使用客户端事务(程序代码):单数据库的事务采用数据库服务器端匿名的PL/SQL块来进行事务管理

  较少使用XA分布式事务。

      

5.2、Application Tier (应用层)

5.2.1、最大限度地扩展J2EE

没有使用大部分的J2EE特性,主要使用了JDBC、servlet和rewrite过得connection pool

保持应用层无状态性:在应用层没有session状态,状态迁移在cookie或数据库存取。

Cache所有能够cache的:Cache公用的metadata,采用复杂的cache刷新机制;Cache从本地存储重新装载(memory db?);Cache数据采用ThreadLocal模式,保证线程安全;

 

5.2.2、分层架构模型

严格按照J2EE规范,把应用画风华展现层(Presentation)、业务层(Business)、集成层(Integration)

应用服务器将不相互通信,不采用cluster方案。

 

5.2.3、数据访问层(Data Access Layer )

采用了ebay内部的存Java OR mapping方案(类Hibernate)

所有的CRUD (Create Read Update Delete)操作都通过DAL的数据接口层操作。

在不变更代码的情况下支撑数据层的水平扩展(应该要修改配置文件吧?)

大量使用JDBC的Prepared Statements
动态的数据路由:Dynamic Data Routing(DDR) hides the physical location of data from developers by providing a mapping from logical names to physical tables and database servers on which they reside. eBay uses some scalability patterns to reduce complexity/latency. For example, on any given day, there are roughly 40 million distinct items for sale on eBay. These items are split amongst 20 different database servers. These details are hidden from developers-they access an Item object using its id (ItemID)-the exact server from which it is fetched is computed by DDR and queries are routed to that host+table at runtime.

DAL的failover机制:可以通过自动或手动方式监控数据源的可用性,在数据源不可用情况下能够自动按照预设的规则把数据源切换到备用主机,据说是在切换过程中用户的操作不会有中断。DAL的failover机制应该有点类似HA或Clsuter的功能(是F5?),只是其实时性、动态性怎样实现的,值得学习。

 
5.2.4、代码垂直分区(Vertical Code Partitioning)

按功能对代码进行分区:应用细粒度化,只操作单一区的数据(例如Selling,Buying等);Domain包含了扩应用的公用业务逻辑(应该就是公用组件)

严格限定应用间的相互依赖性:应用最多只能依赖Domain,而不能依赖其他的应用;在公用的Domain间没有相互依赖关系

image

 

5.2.5、功能分段(Functional Segmentation )

将功能切分层独立的应用池

降低或隔离DB的依赖性

允许平行的开发、部署和监控

 

5.2.6、平台解耦(Platform Decoupling)

将无事务操作的Domain从有事务的流程中解耦出来。

通过异步的EDA和同步的SOA模式对应用进行集成

采用JMS实现子系统之间和与数据库的松耦合。

 

 

6、运营最佳实践

6.1、系统部署

• Demanding Requirements
– Entire site rolled every 2 weeks
– All deployments require staged rollout with immediate rollback if necessary.
– More than 100 WAR configurations.
– Dependencies exist between pools during some deployment operations.
– More than 15,000 instances across eight physical data centers.
• Rollout Plan
– Custom application that works from dependencies provided by projects.
– Creates transitive closure of dependencies.
– Generates rollout plan for Turbo Roller.
• Automated Rollout Tool (“Turbo Roller”)
– Manages full deployment cycle onto all application servers.
– Executes rollout plan.
– Built in checkpoints during rollout, including approvals.
– Optimized rollback, including full rollback of dependent pool

6.2、网管监控

Centralized Activity Logging (CAL)
– Transaction oriented logging per application server
• Transaction boundary starts at request. Nested transactions supported.
• Detailed logging of all application activity, especially database and other external
resources.
• Application generated information and exceptions can be reported.
– Logging streams gathered and broadcast on a message bus.
• Subscriber to log to files (1.5TB/day)
• Subscriber to capture exceptions and generate operational alerts.
• Subscriber for real time application state monitoring.
– Extensive Reporting
• Reports on transactions (page and database) per pool.
• Relationships between URL’s and external resources.
• Inverted relationships between databases and pools/URL’s.
• Data cube reporting on several key metrics available in near real time.

7、搜索优化(Scaling Search)

eBay的搜索引擎系统原来使用的是Thunderstone的系统,但到2002年时候就遇到了性能瓶颈,当时更新一次索引需要9个小时,使用了最高档的设备也满足不了需求。eBay对全文检索的要求很高,产品列表、竞标等信息要求实时更新、很多查询要求返回所有结果、存储有按关键字、分类和结构化属性组织等多种形式。由于没有现成产品能满足所有需求,eBay开发了自己的全文检索系统。

实时供给器平台负责将更新从主数据库可靠广播到多个检索节点,索引支持实时更新,支持内存索引。

索引系统是高度分布式的,索引机有多组副本,一组又包含多台机器。

缓存技术也被应用于搜索系统中,主要是缓存常用搜索或非常耗资源的搜索的结果。

8、平台开发接口

 

9、网络解决方案

http://www.sinogrid.com/shownews.asp?news_id=72

http://www.sinogrid.com/show_solution.asp?solution_id=43&cat_id=15

10、参考资料

10.1、产品研发相关

http://pages.ebay.com/community/chatter/2005november/insideebay.html

http://pages.ebay.com/community/chatter/2005december/insideebay.html

http://www.lukew.com/ff/entry.asp?318

http://www.slideshare.net/lukew/design-patterns-defining-and-sharing-web-design-languages

10.2、架构相关

http://www.artima.com/forums/flat.jsp?forum=106&thread=188683

http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

http://www.addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html

http://www.eweek.com/article2/0,1759,2041437,00.asp

http://highscalability.com/ebay-architecture

http://www.infoq.com/interviews/dan-pritchett-ebay-architecture

http://glinden.blogspot.com/2006/12/talk-on-ebay-architecture.html

http://www.ddj.com/blog/architectblog/archives/2007/08/ad_2007_the_eba.html

http://blogs.zdnet.com/service-oriented/?p=675

http://itmanagement.earthweb.com/service/article.php/3531291

http://article.pchome.net/content-123538.html

http://pages.ebay.com/community/chatter/2005november/insideebay.html

http://designcult.typepad.com/designcult/files/Design_Patterns_IA_Summit_public.pdf

http://au.sun.com/events/dev_forum/files/best_practices.pdf

http://gceclub.sun.com.cn/java_one_online/2003/TS-3264CHI(USA,2003)/ts3264ch.pdf 

http://download.oracle.com/oowsf2004/1235_wp.pdf

ebay 电子商务平台研究(1)-ebay 产品研发流程

需要好好学习研究一下ebay的电子商务平台及paypal支付平台的开发方法论、系统架构模式,为搭建高可用性(high availability))、高可靠性(high reliability)、高扩展性(high scaliability))、高安全性(high security)、高性能(high performance)的电子支付及电子商务平台积累经验。

研究重点集中在两个方面:

  • ebay的产品研发管理
  • 软件架构

1、ebay业务情况

http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

  • 212 million registered users, 1 billion photos
  • eBay users worldwide trade more than $1590 worth of goods every second
  • eBay averages over 1 billion page views per day
  • At any given time, there are approximately 105 million listings on the site
  • eBay stores over 2 Petabytes of data – over 200 times the size of the Library of Congress!
  • The eBay platform handles 3 billion API calls per month
  • 26 Billion SQL executions/day!
  • On an average day, it runs through 26 billion SQL queries and keeps tabs on 100 million items available for purchase.
  • In 33 countries, in seven languages, 24×7
  • 300+ features per quarter,Roll 100,000+ lines of code every two weeks
  • 99.94% availability, measured as “all parts of site functional to everybody” vs. at least one part of a site not functional to some users somewhere

     

    2、ebay电子商务平台演进历史

    image

     

    版本 时间 核心系统技术框架
    (Language,Web Server,DB,OS)
    备注
    V1.0 1995-1997/9 Perl,Apache,GDBM,FreeBSD

    • Built over a weekend in Pierre Omidyar’s living room in 1995
    • System hardware was made up of parts that could be bought at Fry’s
    • Every item was a separate file, generated by a Perl script
    • No search functionality, only category browsing

    V2.0 1997/9-1999/2 C++,IIS,Oracle(Solaris),NT,

    • 3-tiered conceptual architecture (separation of bus/pres and db access tiers)
    • 2-tiered physical implementation (no application server)
    • C++ Library (eBayISAPI.dll) running on IIS on Windows
    • Microsoft index server used for search
    • Items migrated from GDBM to an Oracle database on Solaris

    V2.1 1999/2-1999/11 C++,IIS,Oracle(Solaris),NT

    • Servers grouped into pools (small soldiers)
    • Resonate used for front end load balancing and failover
    • Search functionality moved to the Thunderstone indexing system
    • Back-end Oracle database server scaled vertically to a larger machine (Sun E10000)

    V2.3 1999/6-1999/11 C++,IIS,Oracle(Solaris),NT

    • Second Database added for failover
    • CGI pools, Listings, Pages, and Search continued to scale horizontally
    • By November 1999, the database servers approached their limits of physical growth.

    V2.4 1999/11-2001/4 C++,IIS,Oracle(Solaris),NT

    • Database “split” technology.
    • Logically partition database into separate instances.
    • Horizontal scalability through 2000, but not beyond

    V2.5 2001/4–2002/12 C++,IIS,Oracle(Solaris),NT

    • Horizontal scalability through database splits
    • Items split by category
    • SPOF elimination

    V3.0 2002/12-present Java,Sun Java System Web Server,Oracle(Solaris),Solaris

    • Replace C++/ISAPI with Java.Re-wrote the entire application in J2EE application server framework
    • Leveraged the MSXML framework for the presentation layer
    • Implemented a development kernel as a foundation for programmers

     

    3、ebay产品研发方法论

    3.1、产品管理流程

    http://pages.ebay.com/community/chatter/2005november/insideebay.html

    http://pages.ebay.com/community/chatter/2005december/insideebay.html

    http://pages.ebay.com/community/chatter/2003Apr/InsideeBay.html

    http://pages.ebay.com/community/chatter/2005February/homevisits.html

    http://creativityandinnovation.blogspot.com/2007/05/innovation-and-leadership-lessons-from.html

    从软件产品管理流程的一级流程而言,ebay的产品管理流程与大部分软件公司的研发管理流程倒没有太多的差别,都大致遵循标准软件工程或CMMI之类的模型定义的流程,也即:项目策划(需求收集、业务需求规格说明书、市场分析、盈利分析等)->项目立项(产品需求规格说明书、立项评审会议、项目计划等)->需求分析设计->开发->测试->上线及市场推广。在核心流程定义清楚的情况下,软件开发流程至关重要的是执行力以及流程的持续完善,对此ebay的产品管理流程倒有很多值得借鉴的地方。

    InsideDec05

    ebay产品管理流程

     

    产品的管理流程的核心要素是对于需求的管理(需求的收集、组织、跟踪、审查、确认、变更和验证),ebay使用了“需求漏斗”的概念来描述需求在产品管理流程各个阶段中状态的迁移变化过程。通过对需求层层筛选过滤,保证

    InsideeBay1Nov05

    • ebay需求收集渠道

    Strategic Analysis:

    Community:除了通常的邮件、电话、社区论坛等沟通方式外,eBay’s “Voices” program类似于用户座谈会这样的重要的沟通形式

    Visits program:主要是侧重用户体验部分。

     

     

    3.2、软件开发方法论

    Our site is our product. We change it incrementally through implementing new features.
    • Very predictable development process – trains leave on-time at regular intervals (weekly).
    • Parallel development process with significant output — 100,000 LOC per release.
    • Always on – over 99.94% available.

    以此看来,ebay在开发方法上应该是采用了敏捷软件开发过程或RUP,采用迭代和增量开发方式。

    3.3、用户体验设计

    http://designcult.typepad.com/designcult/files/Design_Patterns_IA_Summit_public.pdf

    http://www.lukew.com/ff/entry.asp?318

    http://www.lukew.com/resources/articles/DesignPatterns_LW.pdf

    3.4、开发者社区

    电子商务网站平台的开放性可以让更多的人参与到价值链的完善中来,这也是Facebook比Myspace能够吸引更多人气的原因所在。而开放性必须依赖于开发者去实现,因此和谐的开发者社区对于构建一个完整的电子商务的生态圈是至关重要的。

    http://www.cioinsight.com/article2/0,1540,2074253,00.asp

    http://blog.programmableweb.com/2007/01/04/how-ebay-scales-their-devnet/ 

  • KOSMOS DISTRIBUTED FILE SYSTEM (KFS)

    http://kosmosfs.sourceforge.net/

    来自startup的垂直搜索引擎http://www.kosmix.com/的开源项目,又一个开源的类似google mapreduce 的分布式文件系统,可以应用在诸如图片存储、搜索引擎、网格计算、数据挖掘这样需要处理大数据量的网络应用中。与hadoop集成得也比较好,这样可以充分利用了hadoop一些现成的功能,基于C++。

    Introduction

    Applications that process large volumes of data (such as, search engines, grid computing applications, data mining applications, etc.) require a backend infrastructure for storing data. Such infrastructure is required to support applications whose workload could be characterized as:

    • Primarily write-once/read-many workloads
    • Few millions of large files, where each file is on the order of a few tens of MB to a few tens of GB in size
    • Mostly sequential access

    We have developed the Kosmos Distributed File System (KFS), a high performance distributed file system to meet this infrastructure need.

    The system consists of 3 components:

    1. Meta-data server : a single meta-data server that provides a global namespace
    2. Block server: Files are split into blocks or chunks and stored on block servers. Blocks are also known as chunk servers. Chunkserver store the chunks as files in the underlying file system (such as, XFS on Linux)
    3. Client library: that provides the file system API to allow applications to interface with KFS. To integrate applications to use KFS, applications will need to be modified and relinked with the KFS client library.

    KFS is implemented in C++. It is built using standard system components such as, TCP sockets, aio (for disk I/O), STL, and boost libraries. It has been tested on 64-bit x86 architectures running Linux FC5.

    While KFS can be accessed natively from C++ applications, support is also provided for Java applications. JNI glue code is included in the release to allow Java applications to access the KFS client library APIs.

    Features
    • Incremental scalability: New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.
    • Availability: Replication is used to provide availability due to chunk server failures. Typically, files are replicated 3-way.
    • Per file degree of replication: The degree of replication is configurable on a per file basis, with a max. limit of 64.
    • Re-replication: Whenever the degree of replication for a file drops below the configured amount (such as, due to an extended chunkserver outage), the metaserver forces the block to be re-replicated on the remaining chunk servers. Re-replication is done in the background without overwhelming the system.
    • Re-balancing: Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.
    • Data integrity: To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.
    • File writes: The system follows the standard model. When an application creates a file, the filename becomes part of the filesystem namespace. For performance, writes are cached at the KFS client library. Periodically, the cache is flushed and data is pushed out to the chunkservers. Also, applications can force data to be flushed to the chunkservers. In either case, once data is flushed to the server, it is available for reading.
    • Leases: KFS client library uses caching to improve performance. Leases are used to support cache consistency.
    • Chunk versioning: Versioning is used to detect stale chunks.
    • Client side fail-over: The client library is resilient to chunksever failures. During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.
    • Language support: KFS client library can be accessed from C++, Java, and Python.
    • FUSE support on Linux: By mounting KFS via FUSE, this support allows existing linux utilities (such as, ls) to interface with KFS.
    • Tools: A shell binary is included in the set of tools. This allows users to navigate the filesystem tree using utilities such as, cp, ls, mkdir, rmdir, rm, mv. Tools to also monitor the chunk/meta-servers are provided.
    • Deploy scripts: To simplify launching KFS servers, a set of scripts to (1) install KFS binaries on a set of nodes, (2) start/stop KFS servers on a set of nodes are also provided.
    • Job placement support: The KFS client library exports an API to determine the location of a byte range of a file. Job placement systems built on top of KFS can leverage this API to schedule jobs appropriately.
    • Local read optimization: When applications are run on the same nodes as chunkservers, the KFS client library contains an optimization for reading data locally. That is, if the chunk is stored on the same node as the one on which the application is executing, data is read from the local node.
    KFS with Hadoop

    KFS has been integrated with Hadoop using Hadoop’s filesystem interfaces. This allows existing Hadoop applications to use KFS seamlessly. The integration code has been submitted as a patch to Hadoop-JIRA-1963 (this will enable distribution of the integration code with Hadoop). In addition, the code as well as instructions will also be available for download from the KFS project page shortly.
    As part of the integration, there is job placement support for Hadoop. That is, the Hadoop Map/Reduce job placement system can schedule jobs on the nodes where the chunks are stored.

    参考资料:

    • distribute file system

    http://lucene.apache.org/hadoop/

    http://www.danga.com/mogilefs/

    http://www.lustre.org/

    http://oss.sgi.com/projects/xfs/

     

    http://www.megite.com/discover/filesystem

    http://swik.net/distributed+cluster

    • cluster&high availability

    http://www.gluster.org/index.php

    http://www.linux-ha.org/

    http://openssi.org

    http://kerrighed.org/

    http://openmosix.sourceforge.net/

     

    http://www.linux.com/article.pl?sid=06/09/12/1459204

    http://labs.google.com/papers/mapreduce.html

     

     

    关于firebug和yslow的资料汇总帖

    在“Firebug 及yslow的相关资料”已经有一些相关文档,作为补充,再整理和总结关于firebug的一些相关文档,供查找。

    Firebug官方文档

    An In-depth Look At The Future of Javascript Debugging With Firebug

    Estelle Weyl’s Introduction to Firebug

    John Barton’s introduction to the Firebug source(Firebug internals

    Firebug Crash Course(slideshare)

    Hacking Web 2.0 Applications with Firefox

    Hacking Digg With Firebug and jQuery

    Firebug Tutorial - Logging, Profiling and CommandLine (Part I)

    Firebug Tutorial - Logging, Profiling and CommandLine (Part II)

    AJAX Debugging with Firebug

    Javascript的调试利器:Firebug使用详解

    初识Firebug 全文 — firebug的使用

     

    Introduction to YSlow: optimizing your actual and perceived download speed

    yahoo developer network YSlow project

    rules for high performance web sites

    High Performance Websites Lalit Patel(slideshare)

    High performance web sites(slideshare)

    Using Firebug & YSlow(slideshare)

    yahoo yui项目组一哥们的blog

    Technorati 标签: , , , ,

     

     

     

    下一页 »