- C
- D
- N
- P
- T
- U
NORMALIZATION_FORMS | = | [:c, :kc, :d, :kd] |
A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization. |
||
UNICODE_VERSION | = | RbConfig::CONFIG["UNICODE_VERSION"] |
The |
[RW] | default_normalization_form | The default normalization used for operations that require normalization. It can be set to any of the normalizations in
|
Compose decomposed characters to the composed form.
Decompose composed characters to the decomposed form.
Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.
-
string
- The string to perform normalization on. -
form
- The form you want to normalize in. Should be one of the following::c
,:kc
,:d
, or:kd
. Default isActiveSupport::Multibyte::Unicode.default_normalization_form
.
# File activesupport/lib/active_support/multibyte/unicode.rb, line 118 def normalize(string, form = nil) form ||= @default_normalization_form # See https://www.unicode.org/reports/tr15, Table 1 if alias_form = NORMALIZATION_FORM_ALIASES[form] ActiveSupport::Deprecation.warn(<<-MSG.squish) ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be removed from Rails 6.1. Use String#unicode_normalize(:#{alias_form}) instead. MSG string.unicode_normalize(alias_form) else ActiveSupport::Deprecation.warn(<<-MSG.squish) ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be removed from Rails 6.1. Use String#unicode_normalize instead. MSG raise ArgumentError, "#{form} is not a valid normalization variant", caller end end
Reverse operation of unpack_graphemes.
Unicode.pack_graphemes(Unicode.unpack_graphemes('क्षि')) # => 'क्षि'
# File activesupport/lib/active_support/multibyte/unicode.rb, line 48 def pack_graphemes(unpacked) ActiveSupport::Deprecation.warn(<<-MSG.squish) ActiveSupport::Multibyte::Unicode#pack_graphemes is deprecated and will be removed from Rails 6.1. Use array.flatten.pack("U*") instead. MSG unpacked.flatten.pack("U*") end
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true
will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.
Unpack the string at grapheme boundaries. Returns a list of character lists.
Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]
# File activesupport/lib/active_support/multibyte/unicode.rb, line 36 def unpack_graphemes(string) ActiveSupport::Deprecation.warn(<<-MSG.squish) ActiveSupport::Multibyte::Unicode#unpack_graphemes is deprecated and will be removed from Rails 6.1. Use string.scan(/\X/).map(&:codepoints) instead. MSG string.scan(/\X/).map(&:codepoints) end