11-RegEx

regular expression

1. 替换

1. 单文的多次匹配

replaceAll() 会将所有匹配到的全部替换掉

1. 替换a

"aab".replaceAll("a{1}", "x");	// xxb
"aba".replaceAll("a{1}", "x");	// xbx

2. 替换aa

"abaaabaaaba".replaceAll("a{2}", "x");	// abxabxaba
"abaabaaaaba".replaceAll("a{2}", "x");	// abxbxxba

2. 提取

Pattern：定义了要匹配的正则表达式模式
Matcher：用于在输入字符串中查找与模式匹配的子序列
find()：尝试找到下一个匹配的子序列
start()：返回匹配子序列的起始索引
end()：返回匹配子序列的结束索引（不包括）
group()：返回匹配的子序列文本。只提取一次匹配到的，要多次提取，需要循环匹配

1. 提取a

Matcher matcher = Pattern.compile("(a)").matcher("ab");
if(matcher.find()){
    System.out.println(matcher.group());
}

// -------- 结果
a

2. 提取多个a

Matcher matcher = Pattern.compile("(a)").matcher("aba");
int matcher_start = 0;
while (matcher.find(matcher_start)){
    System.out.println(matcher.group(1));
    matcher_start = matcher.end();
}


// -------- 结果
a
a

3. 提取复杂内容

在一个文本中提取多个 xml 标签

String txt = "abc123<root>这是root1</root><root>这是root2</root>";
Matcher matcher = Pattern.compile("<root>(.*?)</root>").matcher(txt);
int matcher_start = 0;
while (matcher.find(matcher_start)){
    System.out.println(matcher.group(1));
    matcher_start = matcher.end();
}

// -------- 结果
这是root1
这是root2

3. group使用的注意点

1. 多匹配, 少匹配

正则默认是多匹配的, 尽可能多的匹配到文本

多匹配
- (.*) 不加 ?，只能匹配到一个文本。<root>(.*)</root> 匹配到的是：这是root1</root><root>这是root2，会把中间的全部匹配进去
少匹配
- (.*?) 加上?，尽可能少的匹配。<root>(.*?)</root> 匹配到的是: 这是root1。这个结果一般才是想要的

2. group匹配的组的顺序

// matcher.group() == matcher.group(0)
matcher.group(1) // 数组1位置才是第一个匹配到的分组

1. 别名

element 就是文本的别称，可以直接用别称提取内容

String txt = "abc123<root>这是root1</root><root>这是root2</root>";
Matcher matcher = Pattern.compile("<root>(?<element>.*?)</root>").matcher(txt);
int matcher_start = 0;
while (matcher.find(matcher_start)){
    System.out.println(matcher.group("element"));
    matcher_start = matcher.end();
}

// -------- 结果
这是root1
这是root2