U E D R , A S I H C RSS

Linux/Regular Expression


1. †Œฐœ

ฆฌˆ…Šค™€ ฐ™€ œ ‹‰Šค˜ ํดก  šด˜ฒด œ ํ•˜‹คณดฉด ฐ˜“œ‹œ ตํ˜€• ํ•  ฒƒ“คด ช‡€€ กดžฌํ•œ‹ค.
 •ทœ‹ดŠ” ฒƒ„ ทธค‘˜ ํ•œ€€ด‹ค.
Œ€ถฉ šฉ„  –ดณดžฉด MS‹œŠคํ…œ—„œ˜ *(asterix)ฌธž™€ ฐ™€ šฉ„ณ  ณดฉดœ‹ค.

˜ Œ€€„˜ œ ‹‰Šค • ํ”Œฆฌ€ด…˜€  •ทœ‹˜ ‚ฌšฉ„ €›ํ•œ‹ค. ‹จ ฐฐšฐฉด “ฐž„ด Œ€‹จํ•˜€งŒ
ตํžˆธฐ €ฐฎ‹ค. -.-;

2. Mastering Regular Expressions, 2nd Edition

2.1. Ch1 Introduction to Regular Expressions

  • ^(caret) : ‹œž‘„ ˜ธํ•จ. ^cat€ catœกœ ‹œž‘ํ•˜Š” ฌธž...(cats,cater,caterer...).in the line rather real text
  • $(dollar) : „ ˜ธํ•จ. cat$€ catœกœ ‚˜Š” ฌธž ...(blackcat, whitecat, ....) in the line rather real text
  • [](Character Class) : seperaete ํ•˜ฉด seperate, seperete ชจ‘ ํฌํ•จํ•œ‹ค.
    • - (dash) : ฒ”œ„ ‚˜ํƒ€‚ธ‹ค.([]•ˆ—„œงŒ.. ฐ–—„œŠ” '-'žฒด €ฆฌํ‚จ‹ค) 1-6 1~6Œ€
    • ^ : [] •ˆ—„œŠ” ฐ˜Œ€ œปํ•œ‹ค. ^1-6 € 1~6ด •„‹Œฒƒ“ค„ ˜ธ..
  • . (dot) : –ด–ค ฌธž“  ธฐํ˜ธ“  ํ•ด‹œ‹ค. 03.19.76 ํ•˜ฉด 03/19/76,03-19-76,03.19.76 “ด ํ•ด‹œ‹ค.
  • (OR) : —ฌŸฌฐœ˜ ‹„ ํ•œฐœ˜ ‹œกœ ํ•ฉ„ํ•  ˆ˜ žˆ‹ค. []•ˆ—„œŠ”
    ด ฌธž €ฆฌํ‚ฌ ฟด‹ค. greay, grey|gray,gr(a|e)y Š” ฐ™‹ค.
  • ()(parentheses) : a(a|b)cde ํ•˜ฉด aacde, abcde ‘˜‹ค ํ•ด‹ œ‹ค. bํ•˜ฉด | ฌธžกœ ฒ˜ฆฌ ํ•˜€งŒ (a|b) กœ () ”Œ–ด ฉด | ORกœ ํ•ด„ํ•œ‹ค.
  • -i(option) : Œ€†Œฌธž ตฌ„„ •ˆํ•œ‹ค. ˜ˆ)egrep -i '^(From|Subject|Date): ' mailbox
  • \< : ๋‹จ์–ด๊ฐ€ ์‹œ์ž‘๋˜๋Š” ์ง€์ 
  • \> : ‹จ–ด€ ‚˜Š” € 
  • ? : one optional, ? •ž— žˆŠ” €ž€ žˆ–ด„ ˜ณ  —†–ด„ œ‹ค. colour ํ•˜ฉด color, colur ชจ‘ œ‹ค.
  • * : ? € —ฌŸฌฐœ(ž„˜˜ ฐœˆ˜) žˆŠ” ํšจ (—†Š”ฒƒ„ ํฌํ•จ)
  • + : ? € —ฌŸฌฐœ(ž„˜˜ ฐœˆ˜) žˆŠ” ํšจ

3.  •ทœ‹ ํŽŒ ‚ดšฉ

(1) ^ (caret) : ธ˜ ฒ˜Œด‚˜ ฌธž—ด˜ ฒ˜Œ„ ํ‘œ‹œ

˜ˆ : ^aaa (ฌธž—ด˜ ฒ˜Œ— aaa ํฌํ•จํ•˜ฉด ฐธ, ทธ ‡€ •Šœฉด ง“)

(2) $ (dollar) : ธ˜ ด‚˜ ฌธž—ด˜ „ ํ‘œ‹œ

˜ˆ : aaa$ (ฌธž—ด˜ — aaa ํฌํ•จํ•˜ฉด ฐธ, ทธ ‡€ •Šœฉด ง“)

(3) . (period) : ž„˜˜ ํ•œ ฌธž ํ‘œ‹œ

˜ˆ : ^a.c (ฌธž—ด˜ ฒ˜Œ— abc, adc, aZc “€ ฐธ, aa Š” ง“)

a..b$ (ฌธž—ด˜ — aaab, abbb, azzb “„ ํฌํ•จํ•˜ฉด ฐธ)

(4) [] (bracket) : ฌธž˜ ง‘ํ•ฉด‚˜ ฒ”œ„ ‚˜ํƒ€ƒ„, ‘ ฌธž ‚ฌด˜ "-"Š” ฒ”œ„ ‚˜ํƒ€ƒ„

[]‚ด—„œ "^"ด „ ํ–‰˜ฉด not„ ‚˜ํƒ€ƒ„

ด™ธ—„ "ฌธžํดž˜Šค" ํฌํ•จํ•˜Š” [:ฌธžํดž˜Šค:]˜ ํ˜•ํƒœ€ žˆ‹ค.

—ฌธฐ—„œ "ฌธžํดž˜Šค"—Š” alpha, blank, cntrl, digit, graph, lower,

print, space, uppper, xdigit€ žˆ‹ค.

ด— Œ€ํ•œ ž„ธํ•œ ‚ดšฉ€ C–ธ–ด˜ <ctype.h> ฐธกฐํ•˜ฉด œ‹ค.

˜ˆ “ค–ด [:digit:]Š” [0-9]™€ [:alpha:]Š” [A-Za-z]™€ ™ํ•˜‹ค.

ด™ธ— [:<:]™€ [:>:]Š” –ด–ค ‹จ–ด(ˆซž, •ŒํŒŒฒณ, '_'กœ ตฌ„จ)˜ ‹œž‘ 

„ ‚˜ํƒ€‚ธ‹ค.

˜ˆ : [abc] (a, b, c ค‘ –ด–ค ฌธž, "[a-c]." ™)

[Yy] (Y ˜Š” y)

[A-Za-z0-9] (ชจ“  •ŒํŒŒฒณ ˆซž)

[-A-Z]. ("-"(hyphen) ชจ“  Œ€ฌธž)

[^a-z] (†Œฌธž ด™ธ˜ ฌธž)

[^0-9] (ˆซž ด™ธ˜ ฌธž)

[[:digit:]] ([0-9]™€ ™)

(5) {} (brace) : {} ‚ด˜ ˆซžŠ” ง „˜ „ ํ–‰ฌธž€ ‚˜ํƒ€‚˜Š” ํšŸˆ˜ ˜Š” ฒ”œ„ ‚˜ํƒ€ƒ„

˜ˆ : a{3} ('a'˜ 3ฒˆ ฐ˜ณตธ aaaงŒ ํ•ด‹จ)

a{3,} ('a'€ 3ฒˆ ดƒ ฐ˜ณตธ aaa, aaaa, aaaa, ... “„ ‚˜ํƒ€ƒ„)

a{3,5} (aaa, aaaa, aaaaa งŒ ํ•ด‹จ)

ab{2,3} (abb™€ abbb งŒ ํ•ด‹จ)

[0-9]{2} (‘ žฆฌ ˆซž)

doc[7-9]{2} (doc77, doc87, doc97 “ด ํ•ด‹)

[^Zz]{5} (Z™€ z ํฌํ•จํ•˜€ •ŠŠ” 5ฐœ˜ ฌธž—ด, abcde, ttttt “ด ํ•ด‹)

.{3,4}er ('er'•ž— „ฐœ ˜Š” „ฐœ˜ ฌธž ํฌํ•จํ•˜Š” ฌธž—ดด€กœ Peter, mother “ด ํ•ด‹)

(6) * (asterisk) : "*" ง „˜ „ ํ–‰ฌธž€ 0ฒˆ ˜Š” —ฌŸฌฒˆ ‚˜ํƒ€‚˜Š” ฌธž—ด

˜ˆ : ab*c ('b' 0ฒˆ ˜Š” —ฌŸฌฒˆ ํฌํ•จํ•˜€กœ ac, ackdddd, abc, abbc, abbbbbbbc “)

* („ ํ–‰ฌธž€ —†Š” ฒฝšฐด€กœ ž„˜˜ ฌธž—ด ฐ ณต ฌธž—ด„ ํ•ด‹จ)

.* („ ํ–‰ฌธž€ "."ด€กœ ํ•˜‚˜ ดƒ˜ ฌธž ํฌํ•จํ•˜Š” ฌธž—ด, ณต ฌธž—ด€ •ˆจ)

ab* ('b' 0ฒˆ ˜Š” —ฌŸฌฒˆ ํฌํ•จํ•˜€กœ a, accc, abb, abbbbbbb “)

a* ('a' 0ฒˆ ˜Š” —ฌŸฌฒˆ ํฌํ•จํ•˜€กœ k, kdd, sdfrrt, a, aaaa, abb, ณตฌธž—ด “)

doc[7-9]* (doc7, doc777, doc778989, doc “ด ํ•ด‹)

[A-Z].* (Œ€ฌธžกœงŒ ดฃจ–ด„ ฌธž—ด)

like.* (ง „˜ „ ํ–‰ฌธž€ '.'ด€กœ like— 0 ˜Š” ํ•˜‚˜ ดƒ˜ ฌธž€ ถ”€œ ฌธž—ดด จ, like, likely, liker, likelihood “)

(7) + (asterisk) : "+" ง „˜ „ ํ–‰ฌธž€ 1ฒˆ ดƒ ‚˜ํƒ€‚˜Š” ฌธž—ด

˜ˆ : ab+c ('b' 1ฒˆ ˜Š” —ฌŸฌฒˆ ํฌํ•จํ•˜€กœ abc, abckdddd, abbc, abbbbbbbc “, acŠ” •ˆจ)

ab+ ('b' 1ฒˆ ˜Š” —ฌŸฌฒˆ ํฌํ•จํ•˜€กœ ab, abccc, abb, abbbbbbb “)

like.+ (ง „˜ „ ํ–‰ฌธž€ '.'ด€กœ like— ํ•˜‚˜ ดƒ˜ ฌธž€ ถ”€œ ฌธž—ดด จ, likely, liker, likelihood “, ทธŸฌ‚˜ likeŠ” ํ•ด‹•ˆจ)

[A-Z]+ (Œ€ฌธžกœงŒ ดฃจ–ด„ ฌธž—ด)

(8) ? (asterisk) : "?" ง „˜ „ ํ–‰ฌธž€ 0ฒˆ ˜Š” 1ฒˆ ‚˜ํƒ€‚˜Š” ฌธž—ด

˜ˆ : ab?c ('b' 0ฒˆ ˜Š” 1ฒˆ ํฌํ•จํ•˜€กœ abc, abcd งŒ ํ•ด‹จ)

(9) () (parenthesis) : ()Š”  •ทœ‹‚ด—„œ ํŒจํ„„ ทธํ™” ํ•  •Œ ‚ฌšฉ

(10) | (bar) : or ‚˜ํƒ€ƒ„

˜ˆ : a|b|c (a, b, c ค‘ ํ•˜‚˜, ฆ‰ [a-c]™€ ™ํ•จ)

yes|Yes (yes‚˜ Yes ค‘ ํ•˜‚˜, [yY]es™€ ™ํ•จ)

korea|japan|chinese (korea, japan, chinese ค‘ ํ•˜‚˜)

(11) \ (backslash) : œ„—„œ ‚ฌšฉœ ํŠˆ˜ ฌธž“ค„  •ทœ‹‚ด—„œ ฌธž ทจธ‰ํ•˜ณ  ‹ถ„ •Œ '\' „ ํ–‰‹œœ„œ ‚ฌšฉํ•˜ฉดจ

˜ˆ : filename\.ext ("filename.ext" ‚˜ํƒ€ƒ„)

[\?\[\\\]] ('?', '[', '\', ']' ค‘ ํ•˜‚˜)


 •ทœ‹—„œŠ” œ„—„œ –ธธ‰ํ•œ ํŠˆ˜ ฌธž  œ™ธํ•œ ‚˜จธ€ ฌธž“ค€ ฐ˜ ฌธžกœ ทจธ‰ํ•จ


 •ทœ‹€ Unix˜ Œ€ํ‘œ ธ œ ํ‹ธฆฌํ‹ฐธ vi, emacs, ed, sed, awk, grep, egrep “—„œ ‚ฌšฉํ•  ˆ˜ žˆ‹ค. ‹คŒ€ grep—„œ  •ทœ‹„ ํ™œšฉํ•œ ˜ˆ ณด—ฌ ณ  žˆ‹ค.

(1) $ ช… –ด | grep ' •ทœ‹'

<= ช… –ด˜ ฒฐ grepด ž… ฅฐ›•„  •ทœ‹„ ดšฉํ•˜—ฌ ํŒจํ„„ ฐพ•„ƒ„

˜ˆ : $ who | grep 'hgkim' <= hgkimดŠ” ‚ฌšฉž€ login ํ•ด žˆŠ”€ •Œ•„„

$ ls -al | grep '^d.*' <= ls -al ˜ ฒฐ 'd'กœ ‹œž‘ํ•˜Š” ธ(ฆ‰ ”” ‰ํ† ฆฌ“ค)

งŒ„ ถœ ฅ

$ ls -al | grep '^d.*' <= ls -al ˜ ฒฐ 'd'กœ ‹œž‘ํ•˜Š” ธ(ฆ‰ ”” ‰ํ† ฆฌ“ค)

งŒ„ ถœ ฅ

$ ls -al | grep '^[^d]..x..x..x' <= ”” ‰ํ† ฆฌŠ”  œ™ธํ•˜ณ ("[^d]") ˆ„ตฌ‚˜

‹คํ–‰€Šฅํ•œ ํŒŒ("..x..x..x")“ค ฐพธฐ

(2) $ grep ' •ทœ‹' ํŒŒด„

<= ํŒŒ„ ž… ฅฐ›•„  •ทœ‹„ ดšฉํ•˜—ฌ ํŒจํ„„ ฐป•„ƒ„

˜ˆ: $ grep 'telnet' /etc/inetd.conf


ด™ธ˜ ช… –ด“ค„ grep œ ‚ฌํ•œ ํ˜•ํƒœกœ ดšฉœ‹ค. ”ฐ„œ  •ทœ‹„ ž˜ ดšฉํ•˜ฉด œ ‹‰Šค˜ ํ™œšฉด ฐฐ€   ฒƒด‹ค.



PHP—„œŠ”  •ทœ‹ € จํ•˜—ฌ ‹คŒ˜ „€€ ํ•จˆ˜  œณตํ•œ‹ค.


int ereg(string givenPattern, string givenString, array matched);

- givenString„ "string1stringAstring2stringBstring3 ... string9stringI" กœ –ด ธ žˆ‹คณ  ํ•˜ž. ด•Œ stringA, stringB, ... , stringIŠ” NULL ด–ด„ ƒ€ด —†‹ค(ฆ‰ givenString€ "string1string2string3 ... string9" ธ ฒฝšฐž„).

- givenStringด œ„™€ ฐ™ด –ด„ ฒฝšฐ,

givenPattern€ "(pattern1)stringA(pattern2)stringB(pattern3) ... (pattern9)stringI"กœ ž… ฅํ•˜—ฌ• ํ•œ‹ค. ฆ‰ pattern1, pattern2, ..., pattern9Š” ฐฐ string1, string2, ... , string9—„œ ฐพณ žํ•˜Š”  •ทœ‹ธ ฒƒด‹ค.

- ด•Œ pattern1ด string1—„œ ฐœฒฌํ•œ ํŒจํ„€ $matched[1]—  €žฅ˜ณ , pattern2€ string2—„œ ฐœฒฌํ•œ ํŒจํ„€ $matched[2]—  €žฅ˜ณ , ..., pattern9€ string9—„œ ฐœฒฌํ•œ ํŒจํ„€ $matched[9]—  €žฅœ‹ค. PHP3˜ ฒฝšฐ ereg—„œŠ” ตœŒ€ 9ฐœ Œ€˜ pattern„ ฐพ„ ˆ˜ žˆ„ก „ •˜–ด žˆŒ— œ ˜ํ•˜ž.

- ทธฆฌณ  $matched[0]—Š” $matched[1]stringA$matched[2]stringB ... $matched[9]stringI€  €žฅœ‹ค.

- ereg€ ฐ˜ํ™˜ํ•˜Š” ฐ’€ $matched[0]—  €žฅœ ฌธž—ด˜ ฐœˆ˜ด‹ค.

- eregŠ” case sensitive

- eregiŠ” case insensitive


˜ˆ1 :

ฝ”“œ => print(ereg ("(.*)ef([abc].*)","abcdefabc",$matched));

print("<br>");

while (list($a,$b)=each($matched))

if ($b)

print("$a, $b <br>");

ฒฐ => 9

0, abcdefabc

1, abcd

2, abc

˜ˆ2 :

ฝ”“œ => print(ereg ("(.*)d(.*)e(.*)qrs(.*)","abcdefghijklmnopqrstuvwxyz",$matched));

print("<br>");

while (list($a,$b)=each($matched))

if ($b)

print("$a, $b <br>");

ฒฐ => 26

0, abcdefghijklmnopqrstuvwxyz

1, abc

3, fghijklmnop

4, tuvwxyz

˜ˆ 3 :

ฝ”“œ => $date="1999-11-17";

if (ereg("([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})", $date, $regs))

print("$regs[3].$regs[2].$regs[1]");

else print("Invalid date format: $date");

ฒฐ => 17.11.1999

˜ˆ 4 :

ฝ”“œ => $joomin="711011-1234567";

if (ereg("([0-9]{2})([01]{1}[09]{1}[0-3]{1}[0-9]{1})-([12]{1}[0-9]{6})",$date, $regs))

print("Valid");

else print("Invalid format: $joomin");


int eregi(string givenPattern, string givenString, array matched);

- ereg˜ 'case insensitive' „ 


˜ˆ :

ฝ”“œ => $email="xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr";

eregi("(^[_\.0-9a-z-]+)@(([0-9a-z][0-9a-z-]+\.)+)([a-z]{2,3}$)",$email,$matched);

while (list($a,$b)=each($matched))

if ($b) print("$a, $b <br>");


ฒฐ => 0, xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr

1, xs9_tx-abc.yyy_c

2, cne.kyungsung.ac.

3, ac.

4, kr



ฝ”“œ => eregi("^[_\.0-9a-z-]+@([0-9a-z][0-9a-z-]+\.)+[a-z]{2,3}$",$email,$matched);

while (list($a,$b)=each($matched))

if ($b) print("$a, $b <br>");

ฒฐ => 0, xs9_tx-abc.yyy_c@cne.kyungsung.ac.kr

1, ac.



string ereg_replace(string givenPattern, string replacementPattern, string givenString);

- givenString—„œ givenPattern— €ํ•ฉํ•˜Š” ํ…ŠคํŠธ(matched text) ฐพ•„„œ,

replacementPatternœกœ Œ€ฒด

- givenPatternด "(ํŒจํ„ด)"œกœ ฌถธ ฌธž—ด“ค„ ํฌํ•จํ•˜ณ  žˆœฉด, replacementPattern—Š” ด— Œ€‘ํ•˜Š” "\\digit(ฌธž—ด)" ํ˜•ํƒœ˜ ฌธž—ด“ค„ ํฌํ•จํ•˜ณ  žˆ–ด• ํ•œ‹ค(digitŠ” 0, 1, ... ,9 ค‘ ํ•˜‚˜). ทธฆฌณ  givenString€ "(ํŒจํ„ด)"„ ดšฉํ•ด ฐพ€ ฒฐ“ค„ "\\digit(ฌธž—ด)"— žˆŠ” "ฌธž—ด"“คกœ Œ€ฒดํ•˜ฒŒ œ‹ค. "\\0" Š” givenString  „ฒด— Œ€ํ•ด "(ํŒจํ„ด)"˜ ฒฐ  šฉํ•  •Œ ดšฉœ‹ค.

- €ฒฝœ ฌธž—ด„ ฆฌํ„ด

- case sensitive


˜ˆ :

ฝ”“œ => $string = "This is a test";

print(ereg_replace(" is", " was",$string)); print("<br>");

print(ereg_replace("( )is","\\1was",$string)); print("<br>");

print(ereg_replace("(( )is)","\\2was",$string)); print("<br>");

print(ereg_replace("(( )is)(( )a)(( )test)", "\\1was\\2an\\3exam",$string));

ฒฐ => "This was a test";

"This was a test";

"This was a test";

"This was an exam";


˜ˆ 2 : redundant whitespace —†• ธฐ

ฝ”“œ => $str ="~ s/\s+/ /g";

$str = eregi_replace("[[:space:]]+", " ", $str);

print("$str<br>");

ฒฐ => ~ s/\s+/ /g


string eregi_replace(string givenPattern, string replacementPattern, string givenString);

- ereg_replace˜ 'case insensitive' „ 


Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2021-02-07 05:23:39
Processing time 0.0561 sec