Code python

Remove Special Characters from Column in PySpark DataFrame

Kontext Kontext visibility 14,760 comment 0 access_time 2 years ago language English

descriptionCode description

Spark SQL function regex_replace can be used to remove special characters from a string column in Spark DataFrame. Depends on the definition of special characters, the regular expressions can vary. For instance, [^0-9a-zA-Z_\-]+ can be used to match characters that are not alphanumeric or are not hyphen(-) or underscore(_); regular expression '[@\+\#\$\%\^\!]+' can match these defined special characters.

This code snippet replace special characters with empty string.

Output:

+---+--------------------------+
|id |str                       |
+---+--------------------------+
|1  |ABCDEDF!@#$%%^123456qwerty|
|2  |ABCDE!!!                  |
+---+--------------------------+

+---+-------------------+
| id|       replaced_str|
+---+-------------------+
|  1|ABCDEDF123456qwerty|
|  2|              ABCDE|
+---+-------------------+
fork_rightFork
more_vert
copyright This page is subject to Site terms.
comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts