SET_FROM
SET_FROM:
arg: { arg } # required
separator: { separator } # required
Description
The SET_FROM
operation converts a string (arg
) into a Set
by splitting it using a specified separator
. The resulting set contains unique items, and its behavior regarding uniqueness, case sensitivity, and whitespace handling depends on whether the input arg
is a Text
or Bytes
type.
This operation is fundamentally different from the LIST_FROM
operation because it creates an unordered collection of unique elements, whereas LIST_FROM
preserves order and duplicates.
A key behavior of SET_FROM
is that it discards empty string elements generated during the split process. However, what is considered an "empty string" varies between Text
and Bytes
inputs.
Parameters
arg
(Operation<Text
|Bytes
>, required):- The string value to be split into a set. This should be an operation that resolves to either a
Text
orBytes
value.
- The string value to be split into a set. This should be an operation that resolves to either a
separator
(string, required):- The character or string to use for splitting the
arg
string.
- The character or string to use for splitting the
Return Type
Type-Specific Behavior
The behavior of SET_FROM
is critically dependent on the type of the input arg
.
- When
arg
isText
:- Normalization: Each resulting element from the split is normalized first. This means it is converted to lowercase, and all leading, trailing, and repeated internal whitespace characters (spaces, tabs, newlines) are removed or collapsed.
- Discarding Empty Items: After normalization, any element that becomes an empty string (
""
) is discarded and not included in the final set. - Uniqueness: Uniqueness is determined after normalization, making the comparison case-insensitive.
- When
arg
isBytes
:- No Normalization: The
Bytes
type preserves case and all whitespace characters exactly as they are. No normalization occurs. - Discarding Empty Items: Only elements that are an exact empty string (
""
) after the split are discarded. Strings containing only whitespace (e.g.," "
) are considered valid, non-empty items and are included in the set. - Uniqueness: Uniqueness is determined based on the exact, case-sensitive byte sequence.
- No Normalization: The
Examples
-
Creating a Set of Unique Tags (Text)
This example demonstrates how
SET_FROM
with aText
input handles duplicates, case differences, and extra whitespace.SET_FROM:
arg:
TEXT: " TagA , tagB, taga\n, , tagc "
separator: ","Resulting Set: A set containing three unique, normalized items:
{"taga", "tagb", "tagc"}
.Explanation:
" TagA "
and"taga\n"
both normalize to"taga"
, and only one is kept." tagB"
normalizes to"tagb"
." tagc "
normalizes to"tagc"
.- The empty elements resulting from
,,
and the trailing comma are discarded.
-
Creating a Set from a List of IDs (Bytes)
This example shows how
SET_FROM
with aBytes
input preserves case and whitespace, while still discarding strictly empty elements.SET_FROM:
arg:
BYTES: "ID-A,id-a,ID-B,, ID-C "
separator: ","Resulting Set: A set containing four unique, exact items:
{"ID-A", "id-a", "ID-B", " ID-C "}
.Explanation:
"ID-A"
and"id-a"
are treated as two distinct items due to case sensitivity.- The empty string between the two commas (
,,
) is discarded. " ID-C "
is included with its leading and trailing spaces intact.
Relevant Unit Tests
- Unit tests for
SET_FROM
withText
items can be found here: /ce/unit-test/set-from/text-items/unit-test.logic.yaml. - Unit tests for
SET_FROM
withBytes
items can be found here: /ce/unit-test/set-from/bytes-items/unit-test.logic.yaml.