๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๊ฐœ๋ฐœ/AI

ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„ java/Okt/TwitterKoreanProcessorJava

dependencies {
	implementation 'com.twitter.penguin:korean-text:4.4'
}
// Normalize
CharSequence normalized = TwitterKoreanProcessorJava.normalize(dailyChatMessage.getMessage());


// Tokenize
Seq<KoreanTokenizer.KoreanToken> tokens = (Seq<KoreanTokenizer.KoreanToken>) TwitterKoreanProcessorJava.tokenize(normalized);

// Stemming
Seq<KoreanTokenizer.KoreanToken> stemmed = (Seq<KoreanTokenizer.KoreanToken>) TwitterKoreanProcessorJava.stem(tokens);

// ์ŠคํŠธ๋ง ๋ฆฌ์ŠคํŠธ [์˜ค๋Š˜, ์–ด์ œ, ์Šฌํ”„๋‹ค]
List<String> stemmedStringList = TwitterKoreanProcessorJava.tokensToJavaStringList(stemmed);

for (String string : stemmedStringList) {
	System.out.println(string);
}

// KoreanTokenJava ๋ฆฌ์ŠคํŠธ [ํ•™๊ต(Noun: 15, 2), ๊ฐ€๋‹ค(Verb: 18, 2), ๊ฐ€์กฑ(Noun: 0, 2)]
List<KoreanTokenJava> stemmedKokenList = TwitterKoreanProcessorJava.tokensToJavaKoreanTokenList(stemmed);
             
for (KoreanTokenJava koreanTokenJava : stemmedKokenList) {
	if (koreanTokenJava.getPos().equals(Noun) || koreanTokenJava.getPos().equals(Adjective)) {
		System.out.println(koreanTokenJava.getText());
	}
}


//์–ด๊ตฌ ๋ฆฌ์ŠคํŠธ
List<KoreanPhraseExtractor.KoreanPhrase> phrases = TwitterKoreanProcessorJava.extractPhrases(tokens, true, true);

for (KoreanPhraseExtractor.KoreanPhrase item : phrases) {
	System.out.println(item.text());
}

 

koreanTokenJava.getPos() : ํ’ˆ์‚ฌ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค

ํ’ˆ์‚ฌ ์ข…๋ฅ˜

// Word leved POS
Noun, Verb, Adjective,
Adverb, Determiner, Exclamation,
Josa, Eomi, PreEomi, Conjunction,
NounPrefix, VerbPrefix, Suffix, Unknown,

// Chunk level POS
Korean, Foreign, Number, KoreanParticle, Alpha,
Punctuation, Hashtag, ScreenName,
Email, URL, CashTag,

// Functional POS
Space, Others,

ProperNoun;

 

 

java ์˜ˆ์‹œ๊ฐ€ ์ž˜์—†์–ด์„œ ์ ์–ด๋ณธ๋‹ค

๋ฐ˜์‘ํ˜•

'๊ฐœ๋ฐœ > AI' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€