U E D R , A S I H C RSS

Linux/Regular Expression


1. †Œ๊ฐœ

ฆฌˆ…Šค™€ ๊ฐ™€ œ ‹‰Šค˜ ก  š˜ œฅผ •˜‹ค ฐ˜“œ‹œ ต˜€••  ๊ฒƒ“ค ‡๊ฐ€€ žฌ•œ‹ค.
 •๊ทœ‹ผŠ” ๊ฒƒ„ ๊ทธค‘˜ •œ๊ฐ€€‹ค.
Œ€ถฉ šฉ„ฅผ  –ž MS‹œŠค…œ—„œ˜ *(asterix)ฌธž™€ ๊ฐ™€ šฉ„ผ๊ณ  œ‹ค.

๊ฑฐ˜ Œ€€ถ„˜ œ ‹‰Šค • ”Œฆฌ€…˜€  •๊ทœ‹˜ ‚ฌšฉ„ €›•œ‹ค. ผ‹จ ฐฐšฐ “ฐž„ Œ€‹จ•˜€งŒ
ตžˆ๊ธฐ ๊ท€ฐฎ‹ค. -.-;

2. Mastering Regular Expressions, 2nd Edition

2.1. Ch1 Introduction to Regular Expressions

  • ^(caret) : ‹œž‘„ ˜ฏธ•จ. ^cat€ catœผกœ ‹œž‘•˜Š” ฌธž...(cats,cater,caterer...).in the line rather real text
  • $(dollar) : „ ˜ฏธ•จ. cat$€ catœผกœ ‚˜Š” ฌธž ...(blackcat, whitecat, ....) in the line rather real text
  • [](Character Class) : seperaete •˜ seperate, seperete ‘ฅผ ฌ••œ‹ค.
    • - (dash) : ฒ”œ„ฅผ ‚˜ƒ€‚ธ‹ค.([]•ˆ—„œงŒ.. ฐ–—„œŠ” '-'žฅผ ๊ฐ€ฆฌ‚จ‹ค) 1-6 1~6๊นŒ€
    • ^ : [] •ˆ—„œŠ” ฐ˜Œ€ฅผ œ•œ‹ค. ^1-6 € 1~6 •„‹Œ๊ฒƒ“ค„ ˜ฏธ..
  • . (dot) : ––ค ฌธž“  ๊ธฐ˜ธ“  •‹นœ‹ค. 03.19.76 •˜ 03/19/76,03-19-76,03.19.76 “ฑ •‹นœ‹ค.
  • (OR) : —ฌŸฌ๊ฐœ˜ ‹„ •œ๊ฐœ˜ ‹œผกœ •„ฑ•  ˆ˜ žˆ‹ค. []•ˆ—„œŠ”
     ฌธžฅผ ๊ฐ€ฆฌ‚ฌ ฟ‹ค. greay, grey|gray,gr(a|e)y Š” ๊ฐ™‹ค.
  • ()(parentheses) : a(a|b)cde •˜ aacde, abcde ‘˜‹ค •‹น œ‹ค. b•˜ | ฅผ ฌธžกœ ฒ˜ฆฌ •˜€งŒ (a|b) กœ ()ฅผ ”Œ– ฃผ |ฅผ ORกœ •„•œ‹ค.
  • -i(option) : Œ€†Œฌธž ๊ตฌถ„„ •ˆ•œ‹ค. ˜ˆ)egrep -i '^(From|Subject|Date): ' mailbox
  • \< : ๋‹จ์–ด๊ฐ€ ์‹œ์ž‘๋˜๋Š” ์ง€์ 
  • \> : ‹จ–๊ฐ€ ‚˜Š” € 
  • ? : one optional, ? •ž— žˆŠ” ๊ธ€ž๊ฐ€ žˆ–„ ˜๊ณ  —†–„ œ‹ค. colour •˜ color, colur ‘ œ‹ค.
  • * : ? ๊ฐ€ —ฌŸฌ๊ฐœ(ž„˜˜ ๊ฐœˆ˜) žˆŠ” šจ๊ณผ (—†Š”๊ฒƒ„ ฌ•จ)
  • + : ? ๊ฐ€ —ฌŸฌ๊ฐœ(ž„˜˜ ๊ฐœˆ˜) žˆŠ” šจ๊ณผ

3.  •๊ทœ‹ ŽŒ ‚šฉ

(1) ^ (caret) : ผธ˜ ฒ˜Œ‚˜ ฌธž—˜ ฒ˜Œ„ ‘œ‹œ

˜ˆ : ^aaa (ฌธž—˜ ฒ˜Œ— aaaฅผ ฌ••˜ ฐธ, ๊ทธ ‡€ •Šœผ ๊ฑฐง“)

(2) $ (dollar) : ผธ˜ ‚˜ ฌธž—˜ „ ‘œ‹œ

˜ˆ : aaa$ (ฌธž—˜ — aaaฅผ ฌ••˜ ฐธ, ๊ทธ ‡€ •Šœผ ๊ฑฐง“)

(3) . (period) : ž„˜˜ •œ ฌธžฅผ ‘œ‹œ

˜ˆ : ^a.c (ฌธž—˜ ฒ˜Œ— abc, adc, aZc “ฑ€ ฐธ, aa Š” ๊ฑฐง“)

a..b$ (ฌธž—˜ — aaab, abbb, azzb “ฑ„ ฌ••˜ ฐธ)

(4) [] (bracket) : ฌธž˜ ง‘•‚˜ ฒ”œ„ฅผ ‚˜ƒ€ƒ„, ‘ ฌธž ‚ฌ˜ "-"Š” ฒ”œ„ฅผ ‚˜ƒ€ƒ„

[]‚—„œ "^" „ –‰˜ not„ ‚˜ƒ€ƒ„

™—„ "ฌธžž˜Šค"ฅผ ฌ••˜Š” [:ฌธžž˜Šค:]˜ ˜•ƒœ๊ฐ€ žˆ‹ค.

—ฌ๊ธฐ—„œ "ฌธžž˜Šค"—Š” alpha, blank, cntrl, digit, graph, lower,

print, space, uppper, xdigit๊ฐ€ žˆ‹ค.

— Œ€•œ ž„ธ•œ ‚šฉ€ C–ธ–˜ <ctype.h>ฅผ ฐธกฐ•˜ œ‹ค.

˜ˆฅผ “ค– [:digit:]Š” [0-9]™€ [:alpha:]Š” [A-Za-z]™€ ™ผ•˜‹ค.

™— [:<:]™€ [:>:]Š” ––ค ‹จ–(ˆซž, •ŒŒŒฒณ, '_'กœ ๊ตฌ„ฑจ)˜ ‹œž‘๊ณผ 

„ ‚˜ƒ€‚ธ‹ค.

˜ˆ : [abc] (a, b, c ค‘ ––ค ฌธž, "[a-c]."๊ณผ ™ผ)

[Yy] (Y ˜Š” y)

[A-Za-z0-9] (“  •ŒŒŒฒณ๊ณผ ˆซž)

[-A-Z]. ("-"(hyphen)๊ณผ “  Œ€ฌธž)

[^a-z] (†Œฌธž ™˜ ฌธž)

[^0-9] (ˆซž ™˜ ฌธž)

[[:digit:]] ([0-9]™€ ™ผ)

(5) {} (brace) : {} ‚˜ ˆซžŠ” ง „˜ „ –‰ฌธž๊ฐ€ ‚˜ƒ€‚˜Š” šŸˆ˜ ˜Š” ฒ”œ„ฅผ ‚˜ƒ€ƒ„

˜ˆ : a{3} ('a'˜ 3ฒˆ ฐ˜ณตธ aaaงŒ •‹นจ)

a{3,} ('a'๊ฐ€ 3ฒˆ ƒ ฐ˜ณตธ aaa, aaaa, aaaa, ... “ฑ„ ‚˜ƒ€ƒ„)

a{3,5} (aaa, aaaa, aaaaa งŒ •‹นจ)

ab{2,3} (abb™€ abbb งŒ •‹นจ)

[0-9]{2} (‘ žฆฌ ˆซž)

doc[7-9]{2} (doc77, doc87, doc97 “ฑ •‹น)

[^Zz]{5} (Z™€ zฅผ ฌ••˜€ •ŠŠ” 5๊ฐœ˜ ฌธž—, abcde, ttttt “ฑ •‹น)

.{3,4}er ('er'•ž— „ธ ๊ฐœ ˜Š” „ค ๊ฐœ˜ ฌธžฅผ ฌ••˜Š” ฌธž—€กœ Peter, mother “ฑ •‹น)

(6) * (asterisk) : "*" ง „˜ „ –‰ฌธž๊ฐ€ 0ฒˆ ˜Š” —ฌŸฌฒˆ ‚˜ƒ€‚˜Š” ฌธž—

˜ˆ : ab*c ('b'ฅผ 0ฒˆ ˜Š” —ฌŸฌฒˆ ฌ••˜€กœ ac, ackdddd, abc, abbc, abbbbbbbc “ฑ)

* („ –‰ฌธž๊ฐ€ —†Š” ๊ฒฝšฐ€กœ ž„˜˜ ฌธž— ฐ ๊ณตฐฑ ฌธž—„ •‹นจ)

.* („ –‰ฌธž๊ฐ€ "."€กœ •˜‚˜ ƒ˜ ฌธžฅผ ฌ••˜Š” ฌธž—, ๊ณตฐฑ ฌธž—€ •ˆจ)

ab* ('b'ฅผ 0ฒˆ ˜Š” —ฌŸฌฒˆ ฌ••˜€กœ a, accc, abb, abbbbbbb “ฑ)

a* ('a'ฅผ 0ฒˆ ˜Š” —ฌŸฌฒˆ ฌ••˜€กœ k, kdd, sdfrrt, a, aaaa, abb, ๊ณตฐฑฌธž— “ฑ)

doc[7-9]* (doc7, doc777, doc778989, doc “ฑ •‹น)

[A-Z].* (Œ€ฌธžกœงŒ ฃจ–ง„ ฌธž—)

like.* (ง „˜ „ –‰ฌธž๊ฐ€ '.'€กœ like— 0 ˜Š” •˜‚˜ ƒ˜ ฌธž๊ฐ€ ถ”๊ฐ€œ ฌธž— จ, like, likely, liker, likelihood “ฑ)

(7) + (asterisk) : "+" ง „˜ „ –‰ฌธž๊ฐ€ 1ฒˆ ƒ ‚˜ƒ€‚˜Š” ฌธž—

˜ˆ : ab+c ('b'ฅผ 1ฒˆ ˜Š” —ฌŸฌฒˆ ฌ••˜€กœ abc, abckdddd, abbc, abbbbbbbc “ฑ, acŠ” •ˆจ)

ab+ ('b'ฅผ 1ฒˆ ˜Š” —ฌŸฌฒˆ ฌ••˜€กœ ab, abccc, abb, abbbbbbb “ฑ)

like.+ (ง „˜ „ –‰ฌธž๊ฐ€ '.'€กœ like— •˜‚˜ ƒ˜ ฌธž๊ฐ€ ถ”๊ฐ€œ ฌธž— จ, likely, liker, likelihood “ฑ, ๊ทธŸฌ‚˜ likeŠ” •‹น•ˆจ)

[A-Z]+ (Œ€ฌธžกœงŒ ฃจ–ง„ ฌธž—)

(8) ? (asterisk) : "?" ง „˜ „ –‰ฌธž๊ฐ€ 0ฒˆ ˜Š” 1ฒˆ ‚˜ƒ€‚˜Š” ฌธž—

˜ˆ : ab?c ('b'ฅผ 0ฒˆ ˜Š” 1ฒˆ ฌ••˜€กœ abc, abcd งŒ •‹นจ)

(9) () (parenthesis) : ()Š”  •๊ทœ‹‚—„œ Œจ„„ ๊ทธฃน™” •  •Œ ‚ฌšฉ

(10) | (bar) : orฅผ ‚˜ƒ€ƒ„

˜ˆ : a|b|c (a, b, c ค‘ •˜‚˜, ฆ‰ [a-c]™€ ™ผ•จ)

yes|Yes (yes‚˜ Yes ค‘ •˜‚˜, [yY]es™€ ™ผ•จ)

korea|japan|chinese (korea, japan, chinese ค‘ •˜‚˜)

(11) \ (backslash) : œ„—„œ ‚ฌšฉœ Šนˆ˜ ฌธž“ค„  •๊ทœ‹‚—„œ ฌธžฅผ ทจ๊ธ‰•˜๊ณ  ‹ถ„ •Œ '\'ฅผ „ –‰‹œผœ„œ ‚ฌšฉ•˜จ

˜ˆ : filename\.ext ("filename.ext"ฅผ ‚˜ƒ€ƒ„)

[\?\[\\\]] ('?', '[', '\', ']' ค‘ •˜‚˜)


 •๊ทœ‹—„œŠ” œ„—„œ –ธ๊ธ‰•œ Šนˆ˜ ฌธžฅผ  œ™•œ ‚˜จธ€ ฌธž“ค€ ผฐ˜ ฌธžกœ ทจ๊ธ‰• •๊ทœ‹€ Unix˜ Œ€‘œ ธ œ ‹ธฆฌ‹ฐธ vi, emacs, ed, sed, awk, grep, egrep “ฑ—„œ ‚ฌšฉ•  ˆ˜ žˆ‹ค. ‹คŒ€ grep—„œ  •๊ทœ‹„ ™œšฉ•œ ˜ˆฅผ —ฌ ฃผ๊ณ  žˆ‹ค.

(1) $ … น– | grep ' •๊ทœ‹'

<= … น–˜ ๊ฒฐ๊ณผฅผ grep ž… ฅฐ›•„  •๊ทœ‹„ šฉ•˜—ฌ Œจ„„ ฐพ•„ƒ„

˜ˆ : $ who | grep 'hgkim' <= hgkimผŠ” ‚ฌšฉž๊ฐ€ login • žˆŠ”€ฅผ •Œ•„„

$ ls -al | grep '^d.*' <= ls -al ˜ ๊ฒฐ๊ณผ 'd'กœ ‹œž‘•˜Š” ผธ(ฆ‰ ”” ‰† ฆฌ“ค)

งŒ„ ถœ ฅ

$ ls -al | grep '^d.*' <= ls -al ˜ ๊ฒฐ๊ณผ 'd'กœ ‹œž‘•˜Š” ผธ(ฆ‰ ”” ‰† ฆฌ“ค)

งŒ„ ถœ ฅ

$ ls -al | grep '^[^d]..x..x..x' <= ”” ‰† ฆฌŠ”  œ™•˜๊ณ ("[^d]") ˆ„๊ตฌ‚˜

‹ค–‰๊ฐ€Šฅ•œ ŒŒผ("..x..x..x")“ค ฐพ๊ธฐ

(2) $ grep ' •๊ทœ‹' ŒŒผฆ„

<= ŒŒผ„ ž… ฅฐ›•„  •๊ทœ‹„ šฉ•˜—ฌ Œจ„„ •„ƒ„

˜ˆ: $ grep 'telnet' /etc/inetd.conf


™˜ … น–“ค„ grep๊ณผ œ ‚ฌ•œ ˜•ƒœกœ šฉœ‹ค. ”ฐผ„œ  •๊ทœ‹„ ž˜ šฉ•˜ œ ‹‰Šค˜ ™œšฉ ฐฐ๊ฐ€   ๊ฒƒ‹ค.



PHP—„œŠ”  •๊ทœ‹๊ณผ ๊€ จ•˜—ฌ ‹คŒ˜ „ค๊ฐ€€ •ˆ˜ฅผ  œ๊ณต•œ‹ค.


int ereg(string givenPattern, string givenString, array matched);

- givenString„ "string1stringAstring2stringBstring3 ... string9stringI" กœ ฃผ– ธ žˆ‹ค๊ณ  •˜ž. •Œ stringA, stringB, ... , stringIŠ” NULL –„ ƒ๊€ —†‹ค(ฆ‰ givenString€ "string1string2string3 ... string9" ธ ๊ฒฝšฐž„).

- givenString œ„™€ ๊ฐ™ ฃผ–ง„ ๊ฒฝšฐ,

givenPattern€ "(pattern1)stringA(pattern2)stringB(pattern3) ... (pattern9)stringI"กœ ž… ฅ•˜—ฌ••œ‹ค. ฆ‰ pattern1, pattern2, ..., pattern9Š” ๊ฐ๊ฐ string1, string2, ... , string9—„œ ฐพ๊ณ ž•˜Š”  •๊ทœ‹ธ ๊ฒƒ‹ค.

- •Œ pattern1 string1—„œ ฐœ๊ฒฌ•œ Œจ„€ $matched[1]—  €žฅ˜๊ณ , pattern2๊ฐ€ string2—„œ ฐœ๊ฒฌ•œ Œจ„€ $matched[2]—  €žฅ˜๊ณ , ..., pattern9๊ฐ€ string9—„œ ฐœ๊ฒฌ•œ Œจ„€ $matched[9]—  €žฅœ‹ค. PHP3˜ ๊ฒฝšฐ ereg—„œŠ” ตœŒ€ 9๊ฐœ ๊นŒ€˜ pattern„ ฐพ„ ˆ˜ žˆ„ก „ค •˜– žˆŒ— œ ˜•˜ž.

- ๊ทธฆฌ๊ณ  $matched[0]—Š” $matched[1]stringA$matched[2]stringB ... $matched[9]stringI๊ฐ€  €žฅœ‹ค.

- ereg๊ฐ€ ฐ˜™˜•˜Š” ๊ฐ’€ $matched[0]—  €žฅœ ฌธž—˜ ๊ฐœˆ˜‹ค.

- eregŠ” case sensitive

- eregiŠ” case insensitive


˜ˆ1 :

ฝ”“œ => print(ereg ("(.*)ef([abc].*)","abcdefabc",$matched));

print("<br>");

while (list($a,$b)=each($matched))

if ($b)

print("$a, $b <br>");

๊ฒฐ๊ณผ => 9

0, abcdefabc

1, abcd

2, abc

˜ˆ2 :

ฝ”“œ => print(ereg ("(.*)d(.*)e(.*)qrs(.*)","abcdefghijklmnopqrstuvwxyz",$matched));

print("<br>");

while (list($a,$b)=each($matched))

if ($b)

print("$a, $b <br>");

๊ฒฐ๊ณผ => 26

0, abcdefghijklmnopqrstuvwxyz

1, abc

3, fghijklmnop

4, tuvwxyz

˜ˆ 3 :

ฝ”“œ => $date="1999-11-17";

if (ereg("([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})", $date, $regs))

print("$regs[3].$regs[2].$regs[1]");

else print("Invalid date format: $date");

๊ฒฐ๊ณผ => 17.11.1999

˜ˆ 4 :

ฝ”“œ => $joomin="711011-1234567";

if (ereg("([0-9]{2})([01]{1}[09]{1}[0-3]{1}[0-9]{1})-([12]{1}[0-9]{6})",$date, $regs))

print("Valid");

else print("Invalid format: $joomin");


int eregi(string givenPattern, string givenString, array matched);

- ereg˜ 'case insensitive' ฒ„ ผ


˜ˆ :

ฝ”“œ => $email="xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr";

eregi("(^[_\.0-9a-z-]+)@(([0-9a-z][0-9a-z-]+\.)+)([a-z]{2,3}$)",$email,$matched);

while (list($a,$b)=each($matched))

if ($b) print("$a, $b <br>");


๊ฒฐ๊ณผ => 0, xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr

1, xs9_tx-abc.yyy_c

2, cne.kyungsung.ac.

3, ac.

4, kr



ฝ”“œ => eregi("^[_\.0-9a-z-]+@([0-9a-z][0-9a-z-]+\.)+[a-z]{2,3}$",$email,$matched);

while (list($a,$b)=each($matched))

if ($b) print("$a, $b <br>");

๊ฒฐ๊ณผ => 0, xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr

1, ac.



string ereg_replace(string givenPattern, string replacementPattern, string givenString);

- givenString—„œ givenPattern— €••˜Š” …ŠคŠธ(matched text)ฅผ ฐพ•„„œ,

replacementPatternœผกœ Œ€

- givenPattern "(Œจ„)"œผกœ ฌถธ ฌธž—“ค„ ฌ••˜๊ณ  žˆœผ, replacementPattern—Š” — Œ€‘•˜Š” "\\digit(ฌธž—)" ˜•ƒœ˜ ฌธž—“ค„ ฌ••˜๊ณ  žˆ–••œ‹ค(digitŠ” 0, 1, ... ,9 ค‘ •˜‚˜). ๊ทธฆฌ๊ณ  givenString€ "(Œจ„)"„ šฉ• ฐพ€ ๊ฒฐ๊ณผ“ค„ "\\digit(ฌธž—)"— žˆŠ” "ฌธž—"“คกœ Œ€•˜๊ฒŒ œ‹ค. "\\0" Š” givenString  „— Œ€• "(Œจ„)"˜ ๊ฒฐ๊ณผฅผ  šฉ•  •Œ šฉœ‹ค.

- €๊ฒฝœ ฌธž—„ ฆฌ„

- case sensitive


˜ˆ :

ฝ”“œ => $string = "This is a test";

print(ereg_replace(" is", " was",$string)); print("<br>");

print(ereg_replace("( )is","\\1was",$string)); print("<br>");

print(ereg_replace("(( )is)","\\2was",$string)); print("<br>");

print(ereg_replace("(( )is)(( )a)(( )test)", "\\1was\\2an\\3exam",$string));

๊ฒฐ๊ณผ => "This was a test";

"This was a test";

"This was a test";

"This was an exam";


˜ˆ 2 : redundant whitespace —†• ๊ธฐ

ฝ”“œ => $str ="~ s/\s+/ /g";

$str = eregi_replace("[[:space:]]+", " ", $str);

print("$str<br>");

๊ฒฐ๊ณผ => ~ s/\s+/ /g


string eregi_replace(string givenPattern, string replacementPattern, string givenString);

- ereg_replace˜ 'case insensitive' ฒ„ ผ


Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2021-02-07 05:23:39
Processing time 0.0252 sec