Archive for the '技术-java' Category


门户社区REST(Representational State Transfer)支持技术实现方案

    在门户社区开发中,从性能角度,我们尽量采用动态页面静态化+AJAX的方式来满足性能的要求,但对于一些动态请求,按照SEO的基本原则,所有的页面请求url都应当尽量采用静态页面的形式,而不要采用动态页面形式,以提高搜索引擎的检索率。尽管目前的google及baidu等搜索引擎宣称支持动态页网的抓取,但与传统的html文件相比较抓取率仍不在一个数量级。因此在开发时候url地址不应当采用

    http://www.yeeach.com/login.action?username=myusername&passwd=mypasswd

    而应当采用采用如下形式的url请求地址:

    http://www.yeeach.com/login.action/username/myusername/passwd/mypasswd

    当然也可以采用RoR类似约定俗称的规范方式,通过规则指定对应位置的参数含义:

    http://www.yeeach.com/login.action/ myusername/mypasswd

    目前我们是基于Struts2+Spring+Hibernate的模式,采用此种模式后的开发模式后,一些开发上的调整策略:

1、 页面编写

    在编写页面时候对页面中的url链接地址采用:

    <a href=”http://www.yeeach.com/login.action/username/myusername/passwd/mypasswd”>test</a>

    的形式,而不要采用

    <a href=”http://www.yeeach.com/login.action?username=myusername&passwd=mypasswd”>test</a>的形式

2、 请求处理逻辑

    目前有如下集中方案可以选择:

  • 采用struts2或RoR、CakePHP这样的框架对REST的支持来自动完成

    功能:目前相对有限。另外Struts2官方的release包中,从2.1.2中才缺省带有此功能,以前版本必须自己由代码编译。

    性能:依赖于框架的性能处理,由于struts2中是采用filter+plugin处理,因此性能应该一般

    开发模式:需要遵循其各框架对REST约定俗称的规则。目前要让大家改变这种开发习惯,不是很容易,因此暂时不采用此种方式

  • 采用apache或lighttpd对mod_rewrite的支持来完成

    功能:对rewrite支持较为全面和丰富,由apache等来提供支持

    性能:最高

    开发模式:由mod_rewrite自动完成从http://www.yeeach.com/login.action/username/myusername/passwd/mypasswd转换为http://www.yeeach.com/login.action?username=myusername&passwd=mypasswd,然后提交给tomcat的login.action。在struts2层面后端处理逻辑与正常开发模式相同。

    但是在开发时候,需要有apache或lighttpd支持,相对麻烦,开发时候可以采用如下的UrlRewriteFilter,不采用mod_rewrite模式。

    部署模式:部署时候采用此种模式

  • 采用java的UrlRewriteFilter

    功能:实现了部分mod_rewrite的功能,但需要依赖应用服务器,对于静态页面的处理性能较差。

    性能:一般,与过滤器性能相同

    开发模式:配置UrlRewriteFilter支持,作为开始时候mod_rewrite的替代

    部署模式:采用mod_rewrite,一般情况不采用此种模式,个别情况(例如对原有接口)采用此种模式

3、 结论:

    在开发时候采用UrlRewriteFilter的支持以简化安装部署lighttpd的麻烦,在部署时候采用lighttpd的mod_rewrite来完成REST模式的支持,不采用struts2的REST Plugin。

 

htmlparser encoding 问题

    使用htmlparser爬取一些页面时候(例如http://bbs.pcpop.com/O71228/1286458.html),会抛出org.htmlparser.util.EncodingChangeException异常:

例如执行如下代码(junit代码):

public void testLinkTag() {

try {

           NodeFilter filter = new NodeClassFilter(LinkTag.class);

           Parser parser = new Parser();

           parser.setURL(”http://bbs.pcpop.com/O71228/1286458.html”);

           parser.setEncoding(parser.getEncoding());

logger.fatal(”Encoding is “+parser.getEncoding());

           NodeList list = parser.extractAllNodesThatMatch(filter);

for (int i = 0; i < list.size(); i++) {

              LinkTag node = (LinkTag) list.elementAt(i);

logger.fatal(”testLinkTag() Link is :” + node.extractLink());

           }

       } catch (Exception e) {

           e.printStackTrace();

       }

    }

会抛出如下异常

org.htmlparser.util.EncodingChangeException: character mismatch (new: 涓 [0×6d93] != old:  [0×4e2d中]) for encoding change from UTF-8 to GB2312 at character offset 158

    at org.htmlparser.lexer.InputStreamSource.setEncoding(InputStreamSource.java:280)

    at org.htmlparser.lexer.Page.setEncoding(Page.java:865)

    at org.htmlparser.tags.MetaTag.doSemanticAction(MetaTag.java:150)

    at org.htmlparser.scanners.TagScanner.scan(TagScanner.java:69)

    at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:160)

    at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:92)

    at org.htmlparser.Parser.visitAllNodesWith(Parser.java:726)

    at ParserTestCase1.testImageVisitor(ParserTestCase1.java:71)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:585)

    at junit.framework.TestCase.runTest(TestCase.java:154)

    at junit.framework.TestCase.runBare(TestCase.java:127)

    at junit.framework.TestResult$1.protect(TestResult.java:106)

    at junit.framework.TestResult.runProtected(TestResult.java:124)

    at junit.framework.TestResult.run(TestResult.java:109)

    at junit.framework.TestCase.run(TestCase.java:118)

    at junit.framework.TestSuite.runTest(TestSuite.java:208)

    at junit.framework.TestSuite.run(TestSuite.java:203)

    at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)

    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)

    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

分析此类型的页面可以知道,主要原因还是org.htmlparser.tags.MetaTag对页面缺省Encoding的处理存在问题

对于页面http://bbs.pcpop.com/O71228/1286458.html,其页面缺省的编码为gb2312

        <META http-equiv="Content-Type" content="text/html; charset=gb2312">

但在服务器的Respone中是utf-8编码,因此浏览器是按照utf-8来编码。

HTTP/1.x 200 OK

Date: Thu, 19 Jun 2008 03:16:53 GMT

Server: Microsoft-IIS/6.0

X-Powered-By: ASP.NET

X-AspNet-Version: 2.0.50727

Cache-Control: private

Content-Type: text/html; charset=utf-8

Content-Length: 130386

但在htmlparser中,即使调用parser.setEncoding(parser.getEncoding())后,在MetaTag处理时候,没有沿用Parser设定的encoding

修改如下:

    public void doSemanticAction ()

        throws

            ParserException

    {

        String httpEquiv;

        String charset;

       httpEquiv = getHttpEquiv ();

        if (”Content-Type”.equalsIgnoreCase (httpEquiv)){

             //charset = getPage ().getCharset (getAttribute (”CONTENT”));

             //getPage ().setEncoding (charset);

             if (Page.DEFAULT_CHARSET == getPage ().getEncoding ()){

                 charset = getPage ().getCharset (getAttribute (”CONTENT”));

                 getPage ().setEncoding (charset);

             }

        }

    }

Technorati 标签: ,

Funambol Syncml J2ME版本客户端编译指南

简单描述一下安装测试funambol j2me客户端过程,过程只是用于测试j2me客户端功能,并没有考虑完整的项目应用,供参考备查。

1. 安装ant

从http://apache.mirror.phpchina.com/ant/binaries/apache-ant-1.7.0-bin.zip下载ant

解压到c:\ant下

2. 安装antenna

从http://downloads.sourceforge.net/antenna/antenna-bin-1.0.2.jar下载antenna

拷贝到c:\ant\lib

3. 安装wtk

从http://cds-esd.sun.com/ESD36/JSCDL/sun_java_wireless_toolkit/2.5.2/sun_java_wireless_toolkit-2_5_2-windows.exe?AuthParam=1212392225_9f66723e80a5d5619add0637b1c54f9b&TicketId=B%2Fw5kByCTl9JSBVLP1dSlQHm&GroupName=CDS&FilePath=/ESD36/JSCDL/sun_java_wireless_toolkit/2.5.2/sun_java_wireless_toolkit-2_5_2-windows.exe

下载wkt并安装到c:\WTK2.5.2

4. 下载JMUnit

从http://jaist.dl.sourceforge.net/sourceforge/jmunit/JMUnit_1.0.1.zip下载JMUnit

解压后拷贝到C:\WTK2.5.2\bin及C:\WTK2.5.2\lib下

5. 在Eclipse中安装EclipseME

6. 从objectweb的cvs库获取funambol的j2me包

可以从http://download.forge.objectweb.org/sync4j/funambol-j2me-api-6.5.10.zip下载现成的funambol的j2me包

为了开发方便,从cvs库检出代码,自己编译

cvs库的信息:

认证方式:pserver

主机地址:cvs.forge.objectweb.org

用户名:anonymous

CVSROOT:anonymous@cvs.forge.objectweb.org:/cvsroot/sync4j

检出/funambol/client-api/j2me下的文件

7. 编译common包

7.1. 新建funambol-j2me-common项目

在eclipse中新建一个叫funambol-j2me-common的J2ME Midlet Suite项目,导入/funambol/client-api/j2me/common下的文件

由于没有安装BlackBerry的SDK,因此删除如下文件:

src/com/funambol/storage/BlackberryRecordStore.java

src/com/funambol/storage/BlackberryRecordEnumeration.java

src/com/funambol/util/BlackberryHelper.java

7.2. 修改build.properties

#

# Funambol J2ME API build properties file

#

j2me.name=funambol-j2me-common

j2me.release.major=7

j2me.release.minor=0

j2me.build.number=0

# Set this to your WTK installation, or copy the JAR to the local /lib

wtk.home=C:/WTK2.5.2

wtk.debug=false

# Set this to your JMunit installation, or copy the JAR to the local /lib

lib.junit=C:/WTK2.5.2/lib/JMUnit4CLDC10.jar

# Set this to your JZlib jar file, or copy the JAR to the local /lib

#lib.jzlib=lib/jzlib-1-0-7a.jar

lib.gzip=lib/tinyline-gzip.jar

7.3. 用ant运行build.xml

注意确保把antenna添加到ant的Classpath中,不采用eclipse自身的Ant Home

Run As ->Open External Tools Dialog->Ant Build->funambol-j2me-common->Classpath->Ant Home

7.4. 用EclipseME的工具创建package,以方便funambol-j2me-syncml项目编译使用

Right Button->J2ME->Create Package

生成的打包文件funambol-j2me-common.jar及funambol-j2me-common.jad存放在deployed目录下

8. 编译syncml包

8.1. 新建funambol-j2me-syncml项目

在eclipse中新建一个叫funambol-j2me-syncml的J2ME Midlet Suite项目,导入/funambol/client-api/j2me/syncml下的文件,再在funambol-j2me-syncml创建lib目录

8.2. 拷贝funambol-j2me-common文件

把funambol-j2me-common deploy目录下funambol-j2me-common.jar的文件拷贝到unambol-j2me-syncml的lib目录下

把funambol-j2me-common编译目录(假定为bin)下的文件拷贝到funambol-j2me-syncml的编译目录(假定为bin)下

8.3. 修改build.properties

#

# Funambol J2ME API build properties file

#

j2me.name=funambol-j2me-syncml

j2me.release.major=7

j2me.release.minor=0

j2me.build.number=0

# Set this to your WTK installation, or copy the JAR to the local /lib

wtk.home=C:/wtk2.5.2

wtk.debug=false

# Set this to your JMunit installation, or copy the JAR to the local /lib

lib.junit=C:/WTK2.5.2/lib/JMUnit4CLDC10.jar

# Uncomment this to refer to the output lib of your ‘common’ module, or copy

# the JAR to the local /lib

lib.funambol.common=${basedir}/lib/funambol-j2me-common.jar

8.4. 修改build.xml

将 <target name=”preprocess” depends=”init”>

<mkdir dir=”${dir.preproc.src}”/>

<wtkpreprocess srcdir=”${dir.src}”

destdir=”${dir.preproc.src}”

symbols=”${device.isBlackberry_plugin}”

verbose=”false” indent=”false”>

</wtkpreprocess>

</target>

修改为

<target name=”preprocess” depends=”init”>

<mkdir dir=”${dir.preproc.src}”/>

<wtkpreprocess srcdir=”${dir.src}”

destdir=”${dir.preproc.src}”

verbose=”false” indent=”false”>

</wtkpreprocess>

</target>

实际上就是删除symbols=”${device.isBlackberry_plugin}”

8.5. 在test目录下编写测试用的Midelet SyncMidlet.java

package com.funambol.syncml.client;

import java.util.Random;

import javax.microedition.lcdui.Display;

import javax.microedition.lcdui.TextBox;

import javax.microedition.midlet.MIDlet;

import javax.microedition.midlet.MIDletStateChangeException;

import com.funambol.syncml.protocol.SyncML;

import com.funambol.syncml.spds.SourceConfig;

import com.funambol.syncml.spds.SyncConfig;

import com.funambol.syncml.spds.SyncManager;

import com.funambol.syncml.spds.SyncSource;

import com.funambol.util.Log;

public class SyncMidlet extends MIDlet {

private static final String STORE_NAME = “TESTCONFIG”;

private static final String SOURCE_NAME = “source.briefcase”;

private static final String URL = “http://localhost:8080/funambol/ds”;

private static final String userName=”liangchuan”;

private static final String password=”liangchuan”;

private Display display;

private TextBox t;

private SourceConfig sc;

private SyncConfig conf;

private SyncManager sm;

private TestSyncSource testsrc;

private TestSyncListener sl ;

public SyncMidlet() {

display = Display.getDisplay(this);

t = new TextBox(”Syncml Test MIDlet”, “Syncml Test MIDP!”, 256, 0);

}

protected void destroyApp(boolean arg0) throws MIDletStateChangeException {

}

protected void pauseApp() {

display.setCurrent(t);

}

protected void startApp() throws MIDletStateChangeException {

sc = new SourceConfig();

conf = new SyncConfig();

conf.syncUrl = URL;

conf.userName =userName;

conf.password =password;

conf.deviceConfig.devID = generateDeviceId();

sc.setType(”text/plain”);

sc.setEncoding(SyncSource.ENCODING_NONE);

sm = new SyncManager(conf);

testsrc = new TestSyncSource(sc);

sl = new TestSyncListener();

testsrc.setListener(sl);

sm.sync(testsrc, SyncML.ALERT_CODE_SLOW);

display.setCurrent(t);

}

public String generateDeviceId() {

Random r = new Random();

StringBuffer s = new StringBuffer(”fsc-j2me-api-test-”);

s.append(Long.toString(System.currentTimeMillis(),16));

s.append(Integer.toHexString(r.nextInt()));

String deviceId = s.toString();

return deviceId;

}

}

8.6. 用ant运行build.xml的build[default]、compile、compiletest target

注意确保把antenna添加到ant的Classpath中,不采用eclipse自身的Ant Home

Run As ->Open External Tools Dialog->Ant Build->funambol-j2me-syncml->Classpath->Ant Home

8.7. 拷贝funambol-j2me-common classes文件用于打包

把funambol-j2me-common output/classses下的文件拷贝到funambol-j2me-syncml的编译目录(假定为bin)下

8.8. 拷贝funambol-j2me-syncml项目的文件用于打包

把funambol-j2me-syncml用ant编译目录out/classes下的文件拷贝到funambol-j2me-syncml的编译目录(假定为bin)下

8.9. 用EclipseME的工具创建package

Right Button->J2ME->Create Package

生成的打包文件funambol-j2me-syncml.jar及funambol-j2me-syncml.jad存放在deployed目录下

8.10. 修改funambol-j2me-syncml.jad

双击funambol-j2me-syncml.jad,在Midlets标签中增加MIDlet-1描述如下内容:

Name :SyncMidlet

Icon:无

Class:com.funambol.syncml.client.SyncMidlet

或者直接修改funambol-j2me-syncml.jad,增加

MIDlet-1: SyncMidlet,,com.funambol.syncml.client.SyncMidlet

8.11. 通过WTK测试的Run MIDP Application运行SyncMidlet

8.12. 使用EclipseME打包注意事项:

在用eclipseme打包时候,好像很不稳定,有时候不能把所有的文件打进包中,解决办法:

用手工把funambol-j2me-syncml/bin下的类文件(包括funambol-j2me-common的类)通过jar或winrar打入eclipseme生成的jar包中,同时手工编辑生成funambol-j2me-syncml.jad,修改MIDlet-Jar-Size,其中MIDlet-Jar-Size大小看funambol-j2me-syncml.jar文件属性得到

MIDlet-Jar-Size: 115491

MIDlet-Jar-URL: funambol-j2me-syncml.jar

MIDlet-Name: funambol-j2me-syncml Midlet Suite

MIDlet-Vendor: Midlet Suite Vendor

MIDlet-Version: 1.0.8

MicroEdition-Configuration: CLDC-1.1

MicroEdition-Profile: MIDP-2.0

MIDlet-Name: synctest Midlet Suite

MIDlet-1: SyncMidlet,,com.funambol.syncml.client.SyncMidlet

 

Struts2中静态页面生成策略

    利用Struts2生成静态页面其实很灵活,很强大,尤其是利用Struts2对Freemarker较好的支持,充分利用Freemarker的模板功能来生成静态页面。

    基本思路为:利用Struts2对自定义result type的支持,自定义能够生成静态页面的result type,结合模板引擎Freemarker可以实现大批量静态页面的生成。

    参看org.apache.struts2.views.freemarker.FreemarkerResult的代码实现,自定义了自己的生成静态页面的result type。此种方案不单纯用于生成静态页面,其实也可以用于生成诸如wml、xhtml等内容,具体可以参考Struts2缺省提供的各种result type的实现。

1、com.mobilesoft.esales.webapp.action.FreemarkerResult

package com.mobilesoft.esales.webapp.action;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Locale;

import javax.servlet.ServletContext;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.struts2.ServletActionContext;
import org.apache.struts2.dispatcher.StrutsResultSupport;
import org.apache.struts2.views.freemarker.FreemarkerManager;
import org.apache.struts2.views.util.ResourceUtil;

import com.opensymphony.xwork2.ActionContext;
import com.opensymphony.xwork2.ActionInvocation;
import com.opensymphony.xwork2.LocaleProvider;
import com.opensymphony.xwork2.inject.Inject;
import com.opensymphony.xwork2.util.ValueStack;

import freemarker.template.Configuration;
import freemarker.template.ObjectWrapper;
import freemarker.template.Template;
import freemarker.template.TemplateException;
import freemarker.template.TemplateModel;
import freemarker.template.TemplateModelException;

public class FreemarkerResult extends StrutsResultSupport {

    private static final long serialVersionUID = -3778230771704661631L;

    protected ActionInvocation invocation;
    protected Configuration configuration;
    protected ObjectWrapper wrapper;
    protected FreemarkerManager freemarkerManager;
    private Writer writer;
    protected String location;
    private String pContentType = “text/html”;

    protected String fileName; // 要生成的静态页面名称
    protected String filePath; // 要生成的静态页面的路径
    protected String staticTemplate; // 用于生成静态页面Freemarker模板的路径

    public FreemarkerResult() {
        super();
    }

    public FreemarkerResult(String location) {
        super(location);
    }

    @Inject
    public void setFreemarkerManager(FreemarkerManager mgr) {
        this.freemarkerManager = mgr;
    }

    public void setContentType(String aContentType) {
        pContentType = aContentType;
    }

    public String getContentType() {
        return pContentType;
    }

    public void doExecute(String location, ActionInvocation invocation)
            throws IOException, TemplateException {
        this.location = location;
        this.invocation = invocation;
        this.configuration = getConfiguration();
        this.wrapper = getObjectWrapper();

        this.fileName = (String) conditionalParse(fileName, invocation);
        this.staticTemplate = (String) conditionalParse(staticTemplate, invocation);
        this.filePath = ((String) conditionalParse(filePath, invocation)) == null ? “”
                : ((String) conditionalParse(filePath, invocation));

        if (!location.startsWith(”/”)) {
            ActionContext ctx = invocation.getInvocationContext();
            HttpServletRequest req = (HttpServletRequest) ctx
                    .get(ServletActionContext.HTTP_REQUEST);
            String base = ResourceUtil.getResourceBase(req);
            location = base + “/” + location;
        }

        //生成html页面的模板类
        Template template = configuration.getTemplate(location, deduceLocale());
        // 生成静态页面的的模板类
        Template staticTemplate = configuration.getTemplate(this.staticTemplate,
                deduceLocale());

        TemplateModel model = createModel();
        String path = ServletActionContext.getServletContext().getRealPath(
                filePath)
                + File.separator;
        Writer out = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream(path + fileName)));

        if (preTemplateProcess(template, model)) {
            try {
                staticTemplate.process(model, out);
                template.process(model, getWriter());
            } finally {
                postTemplateProcess(template, model);
                postTemplateProcess(staticTemplate, model);
            }
        }
    }

    protected Configuration getConfiguration() throws TemplateException {
        return freemarkerManager.getConfiguration(ServletActionContext
                .getServletContext());
    }

    protected ObjectWrapper getObjectWrapper() {
        return configuration.getObjectWrapper();
    }

    public void setWriter(Writer writer) {
        this.writer = writer;
    }

    protected Writer getWriter() throws IOException {
        if (writer != null) {
            return writer;
        }
        return ServletActionContext.getResponse().getWriter();
    }

    protected TemplateModel createModel() throws TemplateModelException {
        ServletContext servletContext = ServletActionContext
                .getServletContext();
        HttpServletRequest request = ServletActionContext.getRequest();
        HttpServletResponse response = ServletActionContext.getResponse();
        ValueStack stack = ServletActionContext.getContext().getValueStack();

        Object action = null;
        if (invocation != null)
            action = invocation.getAction(); // Added for NullPointException
        return freemarkerManager.buildTemplateModel(stack, action,
                servletContext, request, response, wrapper);
    }

    protected Locale deduceLocale() {
        if (invocation.getAction() instanceof LocaleProvider) {
            return ((LocaleProvider) invocation.getAction()).getLocale();
        } else {
            return configuration.getLocale();
        }
    }

    protected void postTemplateProcess(Template template, TemplateModel data)
            throws IOException {
    }

    protected boolean preTemplateProcess(Template template, TemplateModel model)
            throws IOException {
        Object attrContentType = template.getCustomAttribute(”content_type”);

        if (attrContentType != null) {
            ServletActionContext.getResponse().setContentType(
                    attrContentType.toString());
        } else {
            String contentType = getContentType();

            if (contentType == null) {
                contentType = “text/html”;
            }

            String encoding = template.getEncoding();

            if (encoding != null) {
                contentType = contentType + “; charset=” + encoding;
            }

            ServletActionContext.getResponse().setContentType(contentType);
        }

        return true;
    }

    public String getFileName() {
        return fileName;
    }

    public void setFileName(String fileName) {
        this.fileName = fileName;
    }

    public String getFilePath() {
        return filePath;
    }

    public void setFilePath(String filePath) {
        this.filePath = filePath;
    }

    public String getStaticTemplate() {
        return staticTemplate;
    }

    public void setStaticTemplate(String staticTemplate) {
        this.staticTemplate = staticTemplate;
    }
}

 

2、struts.xml

        <action name=”staticViewAction” class=”com.mobilesoft.esales.webapp.action.StaticViewtAction”>
            <result name=”success” type=”staticview”>
                <param name=”location”>test/freemarkertest.ftl</param>
                <param name=”contentType”>text/html</param>
                 <param name=”fileName”>${filename}</param>
                <param name=”staticTemplate”>test/freemarkertest.ftl</param>
                <param name=”filePath”>static</param>
            </result>                   
        </action>

 

Freemarker ObjectWrapper使用测试

    先学习Freemarker Programmer Guide中对ObjectWrapper的说明:

    FreeMarker 数据容器(root)可以放置任意的对象,而不一定就是实现了TemplateModel 接口的对象。这是为什么呢?!因为FreeMarker 提供的容器实例会在其内部把放置在其中的对象自动转换成实现了TemplateModel 接口的对象。比如说,
如果你放置一个String 对象在容器中, 它就会把String 对象在内部自动转换成SimpleScalar。
    至于何时发生转换,这是容器自身逻辑的问题。但是最晚也会在获取子变量的时候进行转换,因为获取子变量方法会返回TemplateModel 对象而不是Object 对象。例如,SimpleHash,SimpleSequence 和SimpleCollection 使用延迟转换策略(laziest
strategy);它们会在第一次获取子变量的时候把其他类型的对象转换成TemplateModel类型。
    至于什么类型的对象可以被转换,以及具体转换成何种类型,一方面容器自身可以处理,另一方面也可以把它委托给ObjectWrapper 实例去处理。ObjectWrapper 是一个接口具有一个方法TemplateModel wrap(java.lang.Object obj)。用户可以传递一个Object 对象,它就会返回一个与之对应的TemplateModel 对象,或者抛出异常。这些转换规则是写死在ObjectWrapper 实现里面的。
    FreeMarker 提供的ObjectWrapper 重要的实现有:
ObjectWrapper.DEFAULT_WRAPPER :它可以把String 转换成SimpleScalar ,Number 转换成SimpleNumber,List 和array 转换成SimpleSequence,Map 转换成SimpleHash,Boolean 转换成TemplaeBooleanModel.TRUE/FALSE 等等。(对于其他的类型对象的转换就要调用BEANS_WRAPPER)
ObjectWrapper.BEANS_WRAPPER:它可以使用反射访问任意JavaBean 的属性

    对于Freemarker中如果使用HashMap(或SimpleHash)时候,如果HashMap的键值对(key,value)的value是普通的Scalar对象(String、Double等),此种情况下,对于ObjectWrapper可以直接使用DEFAULT_WRAPPER,在Freemarker模板文件中使用也相对简单,只需要采用如下方式即可:

<#list scalarMap?keys as mykey>
    Scalar Map key is :${mykey}
    Scalar Map value is:${scalarMap[mykey]}
</#list>

    但如果Map的value是JavaBean对象(例如JavaBean为User,有userId和userName两个属性),如果需要在Freemarker模板文件中使用类似el表达式的方式获取JavaBean的属性值,也即:${testmap[key].userId},此种情况下不能采用缺省的DEFAULT_WRAPPER,需要使用ObjectWrapper.BEANS_WRAPPER。

    当然如果在Freemarker模板文件中不需要获取JavaBean对象的属性值,也即只需要获取对象本身:${testmap[key]},则也可以不使用ObjectWrapper.BEANS_WRAPPER。

1、测试用例

import java.io.StringWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import freemarker.template.Configuration;
import freemarker.template.ObjectWrapper;
import freemarker.template.SimpleHash;
import freemarker.template.SimpleSequence;
import freemarker.template.Template;
public class FreeMarkerTest {
    public static void main(String[] args){
        FreeMarkerTest test = new FreeMarkerTest();
        test.sayHelloWorld();
    }
    public void sayHelloWorld(){
        Configuration freemarkerCfg = new Configuration();
        freemarkerCfg.setClassForTemplateLoading(this.getClass(), “/”);
        freemarkerCfg.setEncoding(Locale.getDefault(), “GBK”);
        Template template;
        Locale.setDefault(Locale.SIMPLIFIED_CHINESE);
        try{
            template = freemarkerCfg.getTemplate(”HelloWorld.ftl”);
            template.setEncoding(”GBK”);

            User user1=new User();
            user1.setUserId(”1″);
            user1.setUserName(”1″);

            User user2=new User();
            user2.setUserId(”2″);
            user2.setUserName(”2″);

            User user3=new User();
            user3.setUserId(”3″);
            user3.setUserName(”3″);

            User user4=new User();
            user4.setUserId(”4″);
            user4.setUserName(”4″);
            User user5=new User();
            user5.setUserId(”5″);
            user5.setUserName(”5″);
            User user6=new User();
            user6.setUserId(”6″);
            user6.setUserName(”6″);

            List scalarList = new ArrayList();
            scalarList.add(”red”);
            scalarList.add(”green”);
            scalarList.add(”blue”);
            SimpleHash root = new SimpleHash(ObjectWrapper.BEANS_WRAPPER);
            root.put(”scalarString”, “Scalar String Test”);
            root.put(”scalarNumber”, new Integer(3));
            root.put(”scalarObject”, new User(”33″,”33″));
            root.put(”scalarList”, scalarList);
            SimpleHash scalarMap=new SimpleHash(ObjectWrapper.BEANS_WRAPPER);   
            root.put(”scalarMap”, scalarMap);
            scalarMap.put(”anotherString”, “aaaaaaaa”);
            scalarMap.put(”anotherNumber”, new Double(3.14));
            SimpleSequence userList=new SimpleSequence(ObjectWrapper.BEANS_WRAPPER);
            root.put(”userList”, userList);
            userList.add(user1);
            userList.add(user2);
            userList.add(user3);
            userList.add(user4);           
            userList.add(user5);
            userList.add(user6);
            SimpleHash userMap=new SimpleHash(ObjectWrapper.BEANS_WRAPPER);       
            root.put(”userMap”, userMap);
            userMap.put(”1″, user1);
            userMap.put(”2″, user2);           
            userMap.put(”3″, user3);
            userMap.put(”4″, user4);
            userMap.put(”5″, user5);
            userMap.put(”6″, user6);

            StringWriter writer = new StringWriter();
            template.process(root, writer);
            System.out.println(writer.toString());
        }catch(Exception e){
            e.printStackTrace();
        }}
}

2、HelloWorld.ftl

Scalar String:${scalarString}   
Scalar Number:${scalarNumber}
Object is:${scalarObject}

List使用样例-List元素为Scalar对象:

<#list scalarList as value0>
    Scalar List值:${value0}
</#list>

List使用样例-List元素为User对象:

<#list userList as listUser>
    List对象User Id值:${listUser.userId}
</#list>

Map使用样例-Map Values元素为Scalar :

<#list scalarMap?keys as mykey>
    Scalar Map key is :${mykey}
    Scalar Map value is:${scalarMap[mykey]}
</#list>

Map使用样例-Map Values元素为User对象:

<#list userMap?keys as key1>
    <#assign mapUser=”${userMap[key1]}” >
    User Object is :${mapUser}
    <#–
    以下方法有问题
    User is :${mapUser.userId} <br>
    –>
</#list>

3、User.java

public class User {
    private String userId;
    private String userName;
    public User(){
    }
    public User(String userId,String userName){
        this.userId = userId;
        this.userName = userName;
    }
    public String getUserId() {
        return userId;
    }
    public void setUserId(String userId) {
        this.userId = userId;
    }
    public String getUserName() {
        return userName;
    }
    public void setUserName(String userName) {
        this.userName = userName;
    }
}

 

Struts2 doubleselect标签中select框缺省selected实现

    在项目中,省市下拉框联动采用的是Struts2的doubleselect标签,需要根据业务需求实现两个下拉框动态的缺省值(selected)。

业务场景:

    在代理商管理中,增加代理商时候选择代理商所属的省市,然后增加代理商的销售人员,但代理商销售人员销售产品,如果客户在客户库中没有相关信息,需要增加客户,此时侯应当缺省根据代理商所属的省市信息,在增加客户时候,客户所在省市的缺省selected的值应当为代理商所在的省市信息。

 

主要实现逻辑如下:

    采用doubleselect标签的value和doublevalue属性,在action中定义两个select框缺省值参数(例子中是defaultItem、doubleDefaultItem)的get、set方法,在action方法中根据业务逻辑(在增加客户时候,客户所在省市缺省为销售员所在省市)调用set方法设定两个select框的缺省值,然后在页面通过value和doublevalue方法获取设定的缺省值。

实现样例如下:

1. Action

package com.mobilesoft.esales.webapp.action;

import java.util.ArrayList;

import java.util.HashMap;

import java.util.Map;

import org.apache.log4j.Logger;

public class DoubleListAction extends BaseAction {

private static final Logger logger = Logger.getLogger(DoubleListAction.class);

private String defaultItem;

private String doubleDefaultItem;

public String execute() {

return SUCCESS;

    }

public String doubleSelectTest(){

       Map map=new HashMap();

       ArrayList list1=new ArrayList();

       list1.add(”11″);

       list1.add(”12″);

       list1.add(”13″);

       map.put(”1″, list1);

       ArrayList list2=new ArrayList();

       list2.add(”21″);

       list2.add(”22″);

       list2.add(”23″);

       map.put(”2″, list2);

       ArrayList list3=new ArrayList();

       list3.add(”31″);

       list3.add(”32″);

       list3.add(”33″);

       map.put(”3″, list3);

       setDefaultItem(”2″);

       setDoubleDefaultItem(”23″);

       getRequest().setAttribute(”defaultItem”, getDefaultItem());

       getRequest().setAttribute(”doubleDefaultItem”, getDoubleDefaultItem());

       getRequest().setAttribute(”map”, map);

return SUCCESS;

    }

public String getDefaultItem() {

return defaultItem;

    }

public void setDefaultItem(String defaultItem) {

this.defaultItem = defaultItem;

    }

public String getDoubleDefaultItem() {

return doubleDefaultItem;

    }

public void setDoubleDefaultItem(String doubleDefaultItem) {

this.doubleDefaultItem = doubleDefaultItem;

    }

}

2. doubleselect.jsp

<%@ taglib prefix=”s” uri=”/struts-tags” %>

<%@ page language=”java” errorPage=”/error.jsp” pageEncoding=”GBK” contentType=”text/html;charset=GBK” %>

<html>

<head>

<title>Struts 2 Cool Tags - &lt;s:doubeselect/ &gt;</title>

<s:head />

</head>

<body>

<h2>Doubleselect 缺省值selected使用数据演示:</h2>

<s:form name=”form1″>

<s:doubleselect label=”缺省值测试”

    list=”#request.map.keySet()”        doubleList=”#request.map[top]”

name=”doubleselect1″                doubleName=”doubleselect2″

value=”#request.defaultItem”       doubleValue=”#request.doubleDefaultItem”

    formName=”form1″

/>

</s:form>

</body>

</html>

3. struts.xml

<action name=”doubleSelectTest” method=”doubleSelectTest” class=”com.mobilesoft.esales.webapp.action.DoubleListAction”>

<result name=”success”>test/doubleselect.jsp</result>

</action>

 

Technorati 标签: ,,,

htmlparser使用指南

 

    需要做一个垂直搜索引擎,比较了nekohtml和htmlparser 的功能,尽管nekohtml在容错性、性能等方面的口碑好像比htmlparser好(htmlunit也用的是nekohtml),但感觉nekohtml的测试用例和文档都比htmlparser都少,而且htmlparser基本上能够满足垂直搜索引擎页面处理分析的需求,因此先研究一下htmlparser的使用,有空再研究nekohtml和mozilla html parser的使用。

    html的功能还是官方说得最为清楚,

    HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.

    The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data). While prior versions concentrated on data extraction from web pages, Version 1.4 of the HTMLParser has substantial improvements in the area of transforming web pages, with simplified tag creation and editing, and verbatim toHtml() method output.

    研究的重点还是extraction的使用,有空再研究transformation的使用。

1、htmlparser对html页面处理的数据结构

node

如图所示,HtmlParser采用了经典的Composite模式,通过RemarkNode、TextNode、TagNode、AbstractNode和Tag来描述HTML页面各元素。

  • org.htmlparser.Node:

    Node接口定义了进行树形结构节点操作的各种典型操作方法,包括:

    节点到html文本、text文本的方法:toPlainTextString、toHtml

   典型树形结构遍历的方法:getParent、getChildren、getFirstChild、getLastChild、getPreviousSibling、getNextSibling、getText

    获取节点对应的树形结构结构的顶级节点Page对象方法:getPage

    获取节点起始位置的方法:getStartPosition、getEndPosition

   Visitor方法遍历节点时候方法:accept (NodeVisitor visitor)

    Filter方法:collectInto (NodeList list, NodeFilter filter)

    Object方法:toString、clone

  • org.htmlparser.nodes.AbstractNode

    AbstractNode是形成HTML树形结构抽象基类,实现了Node接口。

    在htmlparser中,Node分成三类:

    RemarkNode:代表Html中的注释

    TagNode:标签节点。

    TextNode:文本节点

    这三类节点都继承AbstractNode。

  • org.htmlparser.nodes.TagNode:

    TagNode包含了对HTML处理的核心的各个类,是所有TAG的基类,其中有分为包含其他TAG的复合节点ComositeTag和不包含其他TAG的叶子节点Tag。

    复合节点CompositeTag:   

        AppletTag,BodyTag,Bullet,BulletList,DefinitionList,DefinitionListBullet,Div,FormTag,FrameSetTag,HeadingTag,

        HeadTag,Html,LabelTag,LinkTag,ObjectTag,ParagraphTag,ScriptTag,SelectTag,Span,StyleTag,TableColumn,

       TableHeader,TableRow,TableTag,TextareaTag,TitleTag

    叶子节点TAG:

        BaseHrefTag,DoctypeTag,FrameTag,ImageTag,InputTag,JspTag,MetaTag,ProcessingInstructionTag,

2、htmlparser对html页面处理的算法

主要是如下几种方式

  • 采用Visitor方式访问Html

try {
    Parser parser = new Parser();
    parser.setURL(”http://www.google.com”);
    parser.setEncoding(parser.getEncoding());
    NodeVisitor visitor = new NodeVisitor() {
        public void visitTag(Tag tag) {
            logger.fatal(”testVisitorAll()  Tag name is :”
                    + tag.getTagName() + ” \n Class is :”
                    + tag.getClass());
        }

    };

    parser.visitAllNodesWith(visitor);
} catch (ParserException e) {
    e.printStackTrace();
}

  • 采用Filter方式访问html

try {

    NodeFilter filter = new NodeClassFilter(LinkTag.class);
    Parser parser = new Parser();
    parser.setURL(”http://www.google.com”);
    parser.setEncoding(parser.getEncoding());
    NodeList list = parser.extractAllNodesThatMatch(filter);
    for (int i = 0; i < list.size(); i++) {
        LinkTag node = (LinkTag) list.elementAt(i);
        logger.fatal(”testLinkTag() Link is :” + node.extractLink());
    }
} catch (Exception e) {
    e.printStackTrace();
}

  • 采用org.htmlparser.beans方式

另外htmlparser 还在org.htmlparser.beans中对一些常用的方法进行了封装,以简化操作,例如:

Parser parser = new Parser();

LinkBean linkBean = new LinkBean();
linkBean.setURL(”http://www.google.com”);
URL[] urls = linkBean.getLinks();

for (int i = 0; i < urls.length; i++) {
    URL url = urls[i];
    logger.fatal(”testLinkBean() -url  is :” + url);
}

 

3、htmlparser关键包结构说明

    htmlparser其实核心代码并不多,好好研究一下其代码,弥补文档不足的问题。同时htmlparser的代码注释和单元测试用例还是很齐全的,也有助于了解htmlparser的用法。


3.1、org.htmlparser

    定义了htmlparser的一些基础类。其中最为重要的是Parser类。

    Parser是htmlparser的最核心的类,其构造函数提供了如下:Parser.createParser (String html, String charset)、 Parser ()、Parser (Lexer lexer, ParserFeedback fb)、Parser (URLConnection connection, ParserFeedback fb)、Parser (String resource, ParserFeedback feedback)、 Parser (String resource)

  各构造函数的具体用法及含义可以查看其代码,很容易理解。

  Parser常用的几个方法:

  •   elements获取元素

    Parser parser = new Parser (”http://www.google.com”);
    for (NodeIterator i = parser.elements (); i.hasMoreElements (); )
      processMyNodes (i.nextNode ());

  • parse (NodeFilter filter):通过NodeFilter方式获取
  • visitAllNodesWith (NodeVisitor visitor):通过Nodevisitor方式
  • extractAllNodesThatMatch (NodeFilter filter):通过NodeFilter方式

3.2、org.htmlparser.beans

    对Visitor和Filter的方法进行了封装,定义了针对一些常用html元素操作的bean,简化对常用元素的提取操作。

    包括:FilterBean、HTMLLinkBean、HTMLTextBean、LinkBean、StringBean、BeanyBaby等。

3.3、org.htmlparser.nodes

    定义了基础的node,包括:AbstractNode、RemarkNode、TagNode、TextNode等。

3.4、org.htmlparser.tags

    定义了htmlparser的各种tag。

3.5、org.htmlparser.filters

    定义了htmlparser所提供的各种filter,主要通过extractAllNodesThatMatch (NodeFilter filter)来对html页面指定类型的元素进行过滤,包括:AndFilter、CssSelectorNodeFilter、HasAttributeFilter、HasChildFilter、HasParentFilter、HasSiblingFilter、IsEqualFilter、LinkRegexFilter、LinkStringFilter、NodeClassFilter、NotFilter、OrFilter、RegexFilter、StringFilter、TagNameFilter、XorFilter

3.6、org.htmlparser.visitors

   定义了htmlparser所提供的各种visitor,主要通过visitAllNodesWith (NodeVisitor visitor)来对html页面元素进行遍历,包括:HtmlPage、LinkFindingVisitor、NodeVisitor、ObjectFindingVisitor、StringFindingVisitor、TagFindingVisitor、TextExtractingVisitor、UrlModifyingVisitor

 

3.7、org.htmlparser.parserapplications

   定义了一些实用的工具,包括LinkExtractor、SiteCapturer、StringExtractor、WikiCapturer,这几个类也可以作为htmlparser使用样例。

3.8、org.htmlparser.tests

   对各种功能的单元测试用例,也可以作为htmlparser使用的样例。

 

4、htmlparser的使用样例

 

import java.net.URL;

import junit.framework.TestCase;

import org.apache.log4j.Logger;
import org.htmlparser.Node;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
import org.htmlparser.Tag;
import org.htmlparser.beans.LinkBean;
import org.htmlparser.filters.NodeClassFilter;
import org.htmlparser.filters.OrFilter;
import org.htmlparser.filters.TagNameFilter;
import org.htmlparser.tags.HeadTag;
import org.htmlparser.tags.ImageTag;
import org.htmlparser.tags.InputTag;
import org.htmlparser.tags.LinkTag;
import org.htmlparser.tags.OptionTag;
import org.htmlparser.tags.SelectTag;
import org.htmlparser.tags.TableColumn;
import org.htmlparser.tags.TableRow;
import org.htmlparser.tags.TableTag;
import org.htmlparser.tags.TitleTag;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.visitors.HtmlPage;
import org.htmlparser.visitors.NodeVisitor;
import org.htmlparser.visitors.ObjectFindingVisitor;

public class ParserTestCase extends TestCase {

    private static final Logger logger = Logger.getLogger(ParserTestCase.class);

    public ParserTestCase(String name) {
        super(name);
    }
    /*
     * 测试ObjectFindVisitor的用法
     */
    public void testImageVisitor() {
        try {
            ImageTag imgLink;
            ObjectFindingVisitor visitor = new ObjectFindingVisitor(
                    ImageTag.class);
            Parser parser = new Parser();
            parser.setURL(”http://www.google.com”);
            parser.setEncoding(parser.getEncoding());
            parser.visitAllNodesWith(visitor);
            Node[] nodes = visitor.getTags();
            for (int i = 0; i < nodes.length; i++) {
                imgLink = (ImageTag) nodes[i];
                logger.fatal(”testImageVisitor() ImageURL = “
                        + imgLink.getImageURL());
                logger.fatal(”testImageVisitor() ImageLocation = “
                        + imgLink.extractImageLocn());
                logger.fatal(”testImageVisitor() SRC = “
                        + imgLink.getAttribute(”SRC”));
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试TagNameFilter用法
     */
    public void testNodeFilter() {
        try {
            NodeFilter filter = new TagNameFilter(”IMG”);
            Parser parser = new Parser();
            parser.setURL(”http://www.google.com”);
            parser.setEncoding(parser.getEncoding());
            NodeList list = parser.extractAllNodesThatMatch(filter);
            for (int i = 0; i < list.size(); i++) {
                logger.fatal(”testNodeFilter() ” + list.elementAt(i).toHtml());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
    /*
     * 测试NodeClassFilter用法
     */
    public void testLinkTag() {
        try {

            NodeFilter filter = new NodeClassFilter(LinkTag.class);
            Parser parser = new Parser();
            parser.setURL(”http://www.google.com”);
            parser.setEncoding(parser.getEncoding());
            NodeList list = parser.extractAllNodesThatMatch(filter);
            for (int i = 0; i < list.size(); i++) {
                LinkTag node = (LinkTag) list.elementAt(i);
                logger.fatal(”testLinkTag() Link is :” + node.extractLink());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }

    }
    /*
     * 测试<link href=” text=’text/css’ rel=’stylesheet’ />用法
     */
    public void testLinkCSS() {
        try {

            Parser parser = new Parser();
            parser
                    .setInputHTML(”<head><title>Link Test</title>”
                            + “<link href=’/test01/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “<link href=’/test02/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “</head>” + “<body>”);
            parser.setEncoding(parser.getEncoding());
            NodeList nodeList = null;

            for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {
                Node node = e.nextNode();
                logger
                        .fatal(”testLinkCSS()” + node.getText()
                                + node.getClass());

            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试OrFilter的用法
     */
    public void testOrFilter() {
        NodeFilter inputFilter = new NodeClassFilter(InputTag.class);
        NodeFilter selectFilter = new NodeClassFilter(SelectTag.class);
        Parser myParser;
        NodeList nodeList = null;

        try {
            Parser parser = new Parser();
            parser
                    .setInputHTML(”<head><title>OrFilter Test</title>”
                            + “<link href=’/test01/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “<link href=’/test02/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “</head>”
                            + “<body>”
                            + “<input type=’text’ value=’text1′ name=’text1′/>”
                            + “<input type=’text’ value=’text2′ name=’text2′/>”
                            + “<select><option id=’1′>1</option><option id=’2′>2</option><option id=’3′></option></select>”
                            + “<a href=’http://www.yeeach.com’>yeeach.com</a>”
                            + “</body>”);

            parser.setEncoding(parser.getEncoding());
            OrFilter lastFilter = new OrFilter();
            lastFilter.setPredicates(new NodeFilter[] { selectFilter,
                    inputFilter });
            nodeList = parser.parse(lastFilter);
            for (int i = 0; i <= nodeList.size(); i++) {
                if (nodeList.elementAt(i) instanceof InputTag) {
                    InputTag tag = (InputTag) nodeList.elementAt(i);
                    logger.fatal(”OrFilter tag name is :” + tag.getTagName()
                            + ” ,tag value is:” + tag.getAttribute(”value”));
                }
                if (nodeList.elementAt(i) instanceof SelectTag) {
                    SelectTag tag = (SelectTag) nodeList.elementAt(i);
                    NodeList list = tag.getChildren();

                    for (int j = 0; j < list.size(); j++) {
                        OptionTag option = (OptionTag) list.elementAt(j);
                        logger
                                .fatal(”OrFilter Option”
                                        + option.getOptionText());
                    }

                }
            }

        } catch (ParserException e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试对<table><tr><td></td></tr></table>的解析
     */
    public void testTable() {
        Parser myParser;
        NodeList nodeList = null;
        myParser = Parser.createParser(”<body> ” + “<table id=’table1′ >”
                + “<tr><td>1-11</td><td>1-12</td><td>1-13</td>”
                + “<tr><td>1-21</td><td>1-22</td><td>1-23</td>”
                + “<tr><td>1-31</td><td>1-32</td><td>1-33</td></table>”
                + “<table id=’table2′ >”
                + “<tr><td>2-11</td><td>2-12</td><td>2-13</td>”
                + “<tr><td>2-21</td><td>2-22</td><td>2-23</td>”
                + “<tr><td>2-31</td><td>2-32</td><td>2-33</td></table>”
                + “</body>”, “GBK”);
        NodeFilter tableFilter = new NodeClassFilter(TableTag.class);
        OrFilter lastFilter = new OrFilter();
        lastFilter.setPredicates(new NodeFilter[] { tableFilter });
        try {
            nodeList = myParser.parse(lastFilter);
            for (int i = 0; i <= nodeList.size(); i++) {
                if (nodeList.elementAt(i) instanceof TableTag) {
                    TableTag tag = (TableTag) nodeList.elementAt(i);
                    TableRow[] rows = tag.getRows();

                    for (int j = 0; j < rows.length; j++) {
                        TableRow tr = (TableRow) rows[j];
                        TableColumn[] td = tr.getColumns();
                        for (int k = 0; k < td.length; k++) {
                            logger.fatal(”<td>” + td[k].toPlainTextString());
                        }

                    }

                }
            }

        } catch (ParserException e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试NodeVisitor的用法,遍历所有节点
     */
    public void testVisitorAll() {
        try {
            Parser parser = new Parser();
            parser.setURL(”http://www.google.com”);
            parser.setEncoding(parser.getEncoding());
            NodeVisitor visitor = new NodeVisitor() {
                public void visitTag(Tag tag) {
                    logger.fatal(”testVisitorAll()  Tag name is :”
                            + tag.getTagName() + ” \n Class is :”
                            + tag.getClass());
                }

            };

            parser.visitAllNodesWith(visitor);
        } catch (ParserException e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试对指定Tag的NodeVisitor的用法
     */
    public void testTagVisitor() {
        try {

            Parser parser = new Parser(
                    “<head><title>dddd</title>”
                            + “<link href=’/test01/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “<link href=’/test02/css.css’ text=’text/css’ rel=’stylesheet’ />”
                            + “</head>” + “<body>”
                            + “<a href=’http://www.yeeach.com’>yeeach.com</a>”
                            + “</body>”);
            NodeVisitor visitor = new NodeVisitor() {
                public void visitTag(Tag tag) {
                    if (tag instanceof HeadTag) {
                        logger.fatal(”visitTag() HeadTag : Tag name is :”
                                + tag.getTagName() + ” \n Class is :”
                                + tag.getClass() + “\n Text is :”
                                + tag.getText());
                    } else if (tag instanceof TitleTag) {
                        logger.fatal(”visitTag() TitleTag : Tag name is :”
                                + tag.getTagName() + ” \n Class is :”
                                + tag.getClass() + “\n Text is :”
                                + tag.getText());

                    } else if (tag instanceof LinkTag) {
                        logger.fatal(”visitTag() LinkTag : Tag name is :”
                                + tag.getTagName() + ” \n Class is :”
                                + tag.getClass() + “\n Text is :”
                                + tag.getText() + ” \n getAttribute is :”
                                + tag.getAttribute(”href”));
                    } else {
                        logger.fatal(”visitTag() : Tag name is :”
                                + tag.getTagName() + ” \n Class is :”
                                + tag.getClass() + “\n Text is :”
                                + tag.getText());
                    }

                }

            };

            parser.visitAllNodesWith(visitor);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    /*
     * 测试HtmlPage的用法
     */
    public void testHtmlPage() {
        String inputHTML = “<html>” + “<head>”
                + “<title>Welcome to the HTMLParser website</title>”
                + “</head>” + “<body>” + “Welcome to HTMLParser”
                + “<table id=’table1′ >”
                + “<tr><td>1-11</td><td>1-12</td><td>1-13</td>”
                + “<tr><td>1-21</td><td>1-22</td><td>1-23</td>”
                + “<tr><td>1-31</td><td>1-32</td><td>1-33</td></table>”
                + “<table id=’table2′ >”
                + “<tr><td>2-11</td><td>2-12</td><td>2-13</td>”
                + “<tr><td>2-21</td><td>2-22</td><td>2-23</td>”
                + “<tr><td>2-31</td><td>2-32</td><td>2-33</td></table>”
                + “</body>” + “</html>”;
        Parser parser = new Parser();
        try {
            parser.setInputHTML(inputHTML);
            parser.setEncoding(parser.getURL());
            HtmlPage page = new HtmlPage(parser);
            parser.visitAllNodesWith(page);
            logger.fatal(”testHtmlPage -title is :” + page.getTitle());
            NodeList list = page.getBody();

            for (NodeIterator iterator = list.elements(); iterator
                    .hasMoreNodes();) {
                Node node = iterator.nextNode();
                logger.fatal(”testHtmlPage -node  is :” + node.toHtml());
            }

        } catch (ParserException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
    /*
     * 测试LinkBean的用法
     */
    public void testLinkBean() {
        Parser parser = new Parser();

        LinkBean linkBean = new LinkBean();
        linkBean.setURL(”http://www.google.com”);
        URL[] urls = linkBean.getLinks();

        for (int i = 0; i < urls.length; i++) {
            URL url = urls[i];
            logger.fatal(”testLinkBean() -url  is :” + url);
        }

    }

}

5、相关的项目

nekohtml :评价比htmlparser好,把html正规化标准的xml文档,用xerces处理,但文档较少。

mozilla htmlparserhttp://www.dapper.net/网站采用的html解析器,开源了,基于mozilla的解析器,值得研究一下。

http://jerichohtml.sourceforge.net/

http://htmlcleaner.sourceforge.net/

http://html.xamjwg.org/cobra.jsp

http://jrex.mozdev.org/

https://xhtmlrenderer.dev.java.net

其他一些html parser可以参考相关的汇总文章:

http://www.manageability.org/blog/stuff/screen-scraping-tools-written-in-java/view

http://java-source.net/open-source/html-parsers

http://www.open-open.com/30.htm

 

6、参考文档

http://www.blogjava.net/lostfire/archive/2006/07/02/56212.html

http://blog.csdn.net/scud/archive/2005/08/11/451397.aspx

http://chasethedevil.blogspot.com/2006/05/java-html-parsing-example-with.html

http://javaboutique.internet.com/tutorials/HTMLParser/

 

下一页 »