1. 1st Source ¶

#include <map>
#include <string>
#include <iostream>
#include <fstream>

using namespace std;

class Anagram
{
private:
	typedef map< string, map<char, int> >::iterator MSMCII;
	typedef map< char, int > MCI;

	map< string, map<char, int> > _anagramTable;
public:
	void CalWhatAnagram(const string& str)
	{
		MCI howManyEachAlphabet;
		for(int i = 0 ; i < str.size() ; ++i)	
			howManyEachAlphabet[ str[i] ] += 1;		
		_anagramTable[str] = howManyEachAlphabet;
	}
	void Show()
	{
		while( !_anagramTable.empty() )
		{
			MCI value = _anagramTable.begin()->second;

			for(MSMCII i = _anagramTable.begin() ; i != _anagramTable.end() ; )
			{
				if(i->second == value)
				{
					cout << i->first << " ";
					_anagramTable.erase(i++);
				}
				else
				{
					++i;
				}
			}
			cout << endl;
		}
	}
};


int main()
{
	Anagram anagram;

	ifstream fin("input.dat");

	string t;

	while(!fin.eof())
	{
		getline(fin, t);
		anagram.CalWhatAnagram(t);
	}
	anagram.Show();

	fin.close();

	return 0;

}

2. 1st 분석 ¶

먼저 사전 파일을 입력받으면서, 키값은 그 단어, 키에 해당하는 값은 <알파벳, 그 알파벳의 출현 개수> Pair인 Pair를 생성한다.(--; 뭔가 좀 말이 이상하군)
- 여기서, 단어의 갯수를 n개, 단어의 평균 길이를 m이라 하면, 이것이 어떤 Pair인가 판단하는데 Θ(mn)의 시간이 걸린다. 다시 그것을 map 컨테이너에 집어넣는데 Θ(n)의 시간이 걸린다.
출력할때는 map 객체를 순회하면서, 한번 첨부터 끝까지 돌면서 anagram찾은건 지워준다.(좀 안좋은 방법 같기는 하다.)
- 이건 최대로 재수 없어도. Θ(n*n) 이상의 시간이 걸리지는 않는다.
일반적으로 단어의 갯수는 단어의 길이보다는... 아무래도 클것이다. 이 알고리즘은 총 Θ(n*n)의 수행시간이 걸린다고 할수 있다.

3. 1st 개선점 ¶

뭔가 더 좋은 방법 찾는중.
2만개짜리 단어장 구해서 파일 읽는 방식으로 바꿨다. 시간 재봐야겠다
기절하겠네~~; 3분 걸리네~~; 저게 10배로 불어나면..~~; 대충 5시간으로 불어난다는 것인가..~~;

4. 2nd Source ¶

~cpp 
#include <map>
#include <string>
#include <vector>
#include <iostream>
#include <fstream>

using namespace std;

class Anagram
{
private:
	typedef map<char, int> MCI;
	typedef vector<string> LS;
	typedef map< MCI, LS > MALS;
	
	typedef MALS::iterator MALSI;
	typedef LS::iterator LSI;

	MALS _anagramTable;

	MCI CalculateWhatAnagram(const string& str)
	{
		MCI howManyEachAlphabet;
		for(int i = 0 ; i < str.size() ; ++i)	
			howManyEachAlphabet[ str[i] ] += 1;		
		return howManyEachAlphabet;
	}
public:
	void BoundAnagram(char* fileName)
	{
		ifstream fin(fileName);
		string str;
		while(!fin.eof())
		{
			getline(fin, str);
			_anagramTable[ CalculateWhatAnagram(str) ].push_back(str); 
		}
	}
	void ShowAnagram()
	{
		for(MALSI i = _anagramTable.begin() ; i != _anagramTable.end() ; ++i)
		{
			for(LSI j = (i->second).begin() ; j != (i->second).end(); ++j)
			{
				cout << *j << " ";
			}
			cout << endl;
		}
	}
};

int main()
{
	Anagram ana;
	ana.BoundAnagram("input.dat");
	ana.ShowAnagram();
	return 0;
}

5. 2nd 분석 ¶

먼저 입력받을때에는 key : 어떤 알파벳이 몇번 나왔나 저장한 map 컨테이너, value : 그 string들의 list. 이런식으로 저장해준다.
1st 버젼은 출력부분에서 대부분의 시간을 까먹었었지만.. 이번엔 입력부분에서 90프로이상을 까먹는거 같다.
수행시간을 대충 계산해볼때, 단어의 개수를 n, 단어의 평균 길이를 m이라 하면, 입력 : Θ(mn), 출력 : Θ(n) 이므로 총 수행시간은 그런데 m은 n보다 훠~~~~~얼씬 작다. Θ(n)이 되는건가?--; 뭔가 좀 궤변 같다.

6. 2nd 개선점 ¶

뭔가 더 좋은 방법 찾는중.
1분(2만개짜리)으로 줄었다. 더 줄일수 있을까. 저게 10배로 불어나면..--; 10분정도 걸리는 걸까.
근데 파일에 출력하니까 10초(2만개짜리)만에 된다. 제길--; 파일에다 할껄
화면에 출력하는게 생각보다 많이 걸린다. 입력된건 "~~~ inputed!"라고 출력하게 했더니 10초가 걸리는데, 저걸 출력하지 않으니 1초도 안걸린다.
list를 vector로 바꾸고 컴퓨터 켜자 마자 측정하니 6.2초 걸린다.

7. 지금까지 느낀점 ¶

알고리즘 개선으로 이정도까지 시간이 줄어들다니.. 정말 놀랍다. 더 줄여봐야겠다.
자료구조도 잘 골라야 한다.
컴터 상태도 좋아야 한다.--;

8. 최종 결과 ¶

17만개짜리 파일에 쓰는데 6.2초

ClassifyByAnagram

Contents

1. 1st Source ¶

2. 1st 분석 ¶

3. 1st 개선점 ¶

4. 2nd Source ¶

5. 2nd 분석 ¶

6. 2nd 개선점 ¶

7. 지금까지 느낀점 ¶

8. 최종 결과 ¶