|
精华帖 (0) :: 良好帖 (1) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
|---|---|
| 作者 | 正文 |
|
最后更新时间:2008-06-11
def truncate_u(text, length = 30, truncate_string = "...")
l=0
char_array=text.unpack("U*")
char_array.each_with_index do |c,i|
l = l+ (c<127 ? 0.5 : 1)
if l>=length
return char_array[0..i].pack("U*")+(i<char_array.length-1 ? truncate_string : "")
end
end
return text
end
|
|
| 返回顶楼 | |
|
最后更新时间:2008-06-12
这个题目其实可以扩充为中日韩等文字字符串的截取...
|
|
| 返回顶楼 | |
|
最后更新时间:2008-06-12
看了老庄的解法,第一次知道了string的pack/unpack方法,呵呵
我的解法是用正则表达式:
def truncate_u(text, length = 30, truncate_string = "...")
if r = Regexp.new("(?:(?:[^\xe0-\xef\x80-\xbf]{1,2})|(?:[\xe0-\xef][\x80-\xbf][\x80-\xbf])){#{length}}", true, 'n').match(text)
r[0].length < text.length ? r[0] + truncate_string : r[0]
else
text
end
end
和老庄的解法比起来就是太难懂了,不过在length比较小的情况下(<50),性能要好一些,顺便把我用的benchmark代码也贴出来:
require 'benchmark'
test_suits = [
["english string", 2],
["中文字符串", 2],
["中文 and english", 6],
["中文 and english", 8],
["veryveryveryveryveryveryveryveryveryveryveryveryveryverylongstring", 20],
["很长verylong很长verylong很长verylong很长verylong很长很长很长很长很的字符串", 30]
]
br = Benchmark.bmbm do |b|
b.report("truncate_u benchmark") do
5000.times {
test_suits.each {|t| truncate_u(t[0], t[1])}
}
end
end
|
|
| 返回顶楼 | |
|
最后更新时间:2008-06-12
String的unpack/pack非常的强大,我也只是用了其中的一个参数而已。
Decodes str (which may contain binary data) according to the format string, returning an array of each value extracted. The format string consists of a sequence of single-character directives, summarized in the table at the end of this entry. Each directive may be followed by a number, indicating the number of times to repeat with this directive. An asterisk (``*’’) will use up all remaining elements. The directives sSiIlL may each be followed by an underscore (``_’’) to use the underlying platform‘s native size for the specified type; otherwise, it uses a platform-independent consistent size. Spaces are ignored in the format string. See also Array#pack. "abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
"abc \0\0".unpack('a3a3') #=> ["abc", " \000\000"]
"abc \0abc \0".unpack('Z*Z*') #=> ["abc ", "abc "]
"aa".unpack('b8B8') #=> ["10000110", "01100001"]
"aaa".unpack('h2H2c') #=> ["16", "61", 97]
"\xfe\xff\xfe\xff".unpack('sS') #=> [-2, 65534]
"now=20is".unpack('M*') #=> ["now is"]
"whole".unpack('xax2aX2aX1aX2a') #=> ["h", "e", "l", "l", "o"]
This table summarizes the various formats and the Ruby classes returned by each. Format | Returns | Function |
|
| 返回顶楼 | |
|
最后更新时间:2008-06-12
quake wang:你的解法在测试‘ab你c好d’时有些问题
两位高手的解法很tricky,我写了个比较低级的
require 'stringio'
$KCODE = "u"
def truncate_u(text, length = 30, truncate_string ="...")
return text if text.size<=length
ios=StringIO.new(text)
while c=ios.getc
break if length<=0
if c>127
length-=1
ios.seek(ios.tell+2) #skip to next 'char'
else
length-=0.5
end
cursor=ios.tell
end
if length<0 #1.5 happens!!!
sub_str=text[0..(cursor-4)]
else
sub_str=text[0..cursor-1]
end
if sub_str.size<text.size
sub_str << truncate_string
else
sub_str
end
end
上述解法启发自simohayha和老庄的0.5,在utf-8编码下有效 这道quize出的着实不错,学到了stringio,unpack,正则,benchmark,值啊.... |
|
| 返回顶楼 | |
|
最后更新时间:2008-06-16
用ruby1.9,特别的简单了:
#-*- coding:utf-8 -*- puts "Once u你好pon a time in a world far far away"[0,15] |
|
| 返回顶楼 | |
|
最后更新时间:2008-06-25
$KCODE='u'
require 'jcode'
require 'iconv'
require 'benchmark'
def truncate_u(text, length = 30, truncate_string = "...")
return text<<truncate_string if text.jsize<=length
result = ""
width = 0
length = length*2
text.each_char { |c|
if width<length
if c.mbchar?
result<<c if width+2<=length
width+=2
else
result<<c
width+=1
end
end
if width>=length
break
end
}
result<<truncate_string
end
puts truncate_u("Helloa中文aaabbbbbbbbb",4)
puts truncate_u("Helloworld",4)
puts truncate_u("He中文lloworld",4)
puts truncate_u("H中文中文elloworld",4)
puts truncate_u("H中",4)
|
|
| 返回顶楼 | |
|
最后更新时间:2008-06-25
sea gull 写道 用ruby1.9,特别的简单了:
#-*- coding:utf-8 -*- puts "Once u你好pon a time in a world far far away"[0,15] 能不能把运行结果也贴出来啊? |
|
| 返回顶楼 | |
|
最后更新时间:2008-06-26
carlosbdw 写道 sea gull 写道 用ruby1.9,特别的简单了:
#-*- coding:utf-8 -*- puts "Once u你好pon a time in a world far far away"[0,15] 能不能把运行结果也贴出来啊? ruby truncate_test.rb Once u你好pon a |
|
| 返回顶楼 | |
|
最后更新时间:2008-06-27
sea gull 写道 用ruby1.9,特别的简单了:
#-*- coding:utf-8 -*- puts "Once u你好pon a time in a world far far away"[0,15] 很好,很强大,是个不错的选择 |
|
| 返回顶楼 | |











