ukai's blog: Debian Description Trends

Sunday, October 28, 2007

Debian Description Trends

~~ちょっと現実逃避~~Real UNIX MAGAZINE DayネタにDebian Description Trends を作ってみた。

Debian Description Trendsは次のようなかんじで作った。まず、wordのとりだし。UNIX shellパワーを駆使(というか超適当)


for file in */*/01/debian/dists/sid/main/binary-i386/Packages.gz
do
  date=$(date -d $(echo $file | sed -e 's/\/debian.*//') +%Y-%m)
  gunzip < $file | 
    sed -ne 's/Description: //p' -e '/^ /p' | 
      tr '[[:upper:]]' '[[:lower:]]' |
    sed -e 's/[^-\/+[:alnum:]]/ /g'
        -e 's/[[:space:]][[:space:]]*/\n/g' |
    sort | uniq -c | 
      sort -nr > ~/tmp/desc-words-$date.lst
done

これで desc-words-$YYYY-$mm.lst にwordごとのcountが示されたファイルができる最初これを ruby scriptで gnuplot script を生成してgraphを生成していたのだが、webで見られるようにしたほうが便利だろうと思い途中でcgiに変更。使ってみるとscanしてgnuplot script生成がかなり遅かったのでデータをsqliteにつっこんでみた。


CREATE TABLE words (
  year integer,
  month integer,
  word string,
  count integer
);
CREATE INDEX words_word on words ( word );

つっこむ時はこんなかんじで


#!/usr/bin/ruby
require 'sqlite3'
..
db = SQLite3::Database.new(dbname)
..
   sql.push("insert into words values (#{year}, #{month}, '#{kw}', #{n});")
..
db.execute_batch(sql.join("\n"))

cgiの中はこうやってとりだし


#!/usr/bin/ruby
require 'sqlite3'
...
  keyword = q.gsub(/,/, "").split
  kwv = {}
  db = SQLite3::Database.new(DATAFILE)
  db.results_as_hash = true
  db.execute("select year, month, word, count from words "+
             "where word in " +
             "(#{keyword.collect{|v| '?'}.join(', ')})",
               *keyword) do |row|
     date = "#{row['year']}-#{row['month']}"
     kwv[date] ||= {}
     kwv[date][row['word']] = row['count'].to_s
  end
...

ukai's blog

Sunday, October 28, 2007

Debian Description Trends

No comments:

Post a Comment