Perl Regular Expression The Power To Know The PERL In

2y ago

26 Views

2 Downloads

450.15 KB

10 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kamden Hassan

Report this link

Download PDF

Transcription

MWSUG 2018 – Paper SB-145Perl Regular Expression – The Power to Know the PERL in Your DataKaushal Chaudhary, Eli Lilly and Company, Indianapolis, INDhruba Ghimire, Eli Lilly and Company, Indianapolis, INABSTRACTPerl regular expression is one of the powerful and efficient techniques for complex string datamanipulation. SAS offers regular expression engine in the base SAS without any additional licenserequirement. This would be a great addition to a SAS programmers’ toolbox. In this paper, we presentbasics of the Perl regular expression and various Perl regular functions and call routine such asPRXPARSE(), PRXMATCH(), and CALL PRXCHANGE () etc. with examples. The presentation isintended for beginner and intermediate SAS programmers.INTRODUCTIONRegular expression is a great tool for text data manipulation. Many other programming languages haveregular expression engine in them to facilitate text data analysis. SAS also introduced regular expressionsince version 9. SAS has few functions and call routines available to use the regular expression. Some ofthese functions perform similar functions as regular SAS character functions such as substr () and scan(), however, in many situations they add extra flexibility as compared to those. The prxparse () functionenables to handle the complex string using the wealth of metacharacters and types of regular expressionas argument.A non-exhaustive list of metacharacters and regular expressions (character class, grouping, alternation,repetition, and anchored expressions) are presented in table 1. A thorough understanding of these arenecessary to master the regular expressions for any programming languages as they are more or lesssimilar across languages.Metacharacters and Regular ExpressionsMetacharacters are characters that have special meaning at regular expressions. They are escaped withbackward slash (\) to match literally, for example, \. would match ‘.’. The following table has themetacharacters and different type of regular expressions with description, and example use.Table 1Metacharacters/regular expressiontypes. ?* ?\d\D\w\W\s\SDescriptionExampleMatches any one characterMatches the precedingcharacter one or more timesMatches the precedingcharacter zero or one timeMatches the precedingcharacter zero or more timesMatches at least as possibleMatches digit charactersMatches non-digit charactersMatches word charactersMatches non-word charactersMatches white spaceMatches nonwhite spaceMatches 1, a etc.abc matches ‘abc’, ‘abcc’,‘abccc’ etc.abc? matches ‘abc’ , ‘ab’1abc* matches ‘ab’, ‘abc’, ‘abcc’,‘abccc’\a ? matches ‘a’ in text ‘aaaaa’.\d matches ‘1’ in ‘a1’\D matches ‘a’ in ‘a1’\w matches ‘a’ in ‘a1’\W matches ‘1’ in ‘a1’

Metacharacters/regular expressiontypes\bDescriptionExampleWord boundary\bMWSUG\b matches‘MWSUG’ in ‘This is MWSUG2018 but not in ‘MWSUG2018’.\B \(\)\\[abc][ abc][A-Z1-9]Non word boundaryMatches at the beginning ofthe stringMatches at the end of thestringMatches ‘(‘Matches ‘)’Matches ‘\’Character setCharacter setCharacter set()Grouping \d{m}[[:alpha:]]AlternationQuantified expressionmatches m number of digitsQuantified expression –matches at least m number ofdigitsQuantified expression –matches minimum m andmaximum n number of digitsQuantified expressionmatches m number of wordcharactersQuantified expressionmatches at least m number ofword charactersQuantified expressionmatches m minimum numberof word characters and nmaximum number of wordcharactersPOSIX character expressions[[:digits:]]POSIX character expressions \d{m,}\d{m, n}\w{m}\w{m,}\w{m, n} This is MWSUG 2018.matches ‘This’This is MWSUG 2018 .matches 2018\(MWSUG matches (MWSUGMWSUG)\ matches MWSUG)abc\\123 matches abc\123Matches a, b, or cMatches other than a, b, or cMatches all alphabets anddigits/(abc) / matches ‘abc’ and‘abcabcabc’/abc def/ matches ‘abc’ or ‘def’\d{2} matches ‘12’\d{2,} matches ’12’, ‘123’\d{2,3} matches ’12’, ‘123’\w{2} matches ‘ab’\w{2,} matches ‘ab’, ‘abc’\w{2,3} matches ‘ab’ and ‘abc’Matches all alphabets (a-z A-Z)and underscore ( ).Matches all digits (0-9)FUNCTIONSPRXPARSE (), PRXMATCH (), PRXCHANGE (), PRXPOSN (), and PRXPAREN () will be illustrated belowwith examples.PRXPARSESyntax: PRXPARSE (Perl-regular-expression)Perl-regular-expression: The pattern to be parsed2

PRXPARSE uses metacharacters to construct the regular Perl expression. It compiles a Perl regularexpression that can be used by other Perl regular expression functions/call routines for pattern matchingof a character value.Program 1data have;input patient 1-15 string 50.;datalines;WWW-100-01001 CERVICAL PAIN (MUSCULAR)XXX-200-02001 MUSCULO-SQUELETTIC PAINYYY-300-03001 back painZZZ-400-04001 ABDOMINAL PAIN CHEST PAIN;run;data want;set have;retain pattern;if n 1 thenpattern prxparse(‘/pain/I’);pos prxmatch (pattern,string);run;In the program above, regular expression (pattern) is created during the first iteration ( n 1) of the datastep and retaining it. Another alternative would be using modifier ‘o’. The example program has singleregular expression, however, multiple regular expressions can be created in a single data step. Modifier ‘i’makes the pattern matching case-insensitive and matches all string having ‘pain’ or ‘PAIN’.PRXMATCHSyntax: PRXMATCH (pattern-id or regular-expression, string)Pattern-id: Returned value from PRXPARSE function, regular-expression: Perl regular expression, string:Character valuePRXMATCH function is used to search a pattern match and returns the position at which the pattern isfound. If there is no match found, PRXMATCH returns a zero but if there are multiple matches found, onlythe position of the first match is returned.Program 2data want;set have;if n 1 thenpattern prxparse(‘/PAIN/’);retain pattern;pos prxmatch (pattern, string);run;Output:3

Case I: In string CERVICAL PAIN (MUSCULAR), the position of the first character of the pattern match(PAIN) returns 10.Case II: In string back pain, no pattern found and returns to zero.Case III: In ABDOMINAL PAIN CHEST PAIN, pattern matches twice but the position of the first matchreturns to 11.PRXCHANGESyntax: PRXCHANGE (Perl-regular-expression regular-expression-id, times, source)Perl-regular-expression: The pattern to be parsed, regular-expression-id: Returned value fromPRXPARSE function, times: The number of times to perform the match and substitutionSource: The character string where the pattern is to be searchedThe PRXCHANGE function performs a replacement for a matched pattern. The ‘s’ before the firstdelimiter indicates substitution in the code. The first argument of the function has two components-findand replace.Program 3data want;set have;update prxchange(‘s/ pain/ ACHE/I’,run;-1, string);Output:In program 3, ‘pain’ is replaced by ‘ACHE ‘from the string. The modifier ‘i’ makes the string caseinsensitive so that all ‘PAIN’ from the string are also replaced here. The second argument -1 indicatesthat all occurrences are replaced when found in the variable string.PRXPOSNSyntax: PRXPOSN (regular-expression-id, capture-buffer, source)Regular-expression-id: Returned value from PRXPARSE function, capture-buffer: Number indicatingwhich capture buffer is to be evaluated, source:The character string where the pattern is to be searched.4

PRXPOSN function returns the matched information from identified capture. PRXMATCH, PRXSUBSTR,PRXNEXT or PRXCHANGE functions are used before PRXPOSN function to reference the capturebuffer. In addition, regular expression id is required for this function.Program 4data want;length study site patid 10;keep study site patid;retain re;if n 1 thenre prxparse('/(\w )-(\d{3})-(\d{5})/');set have;if prxmatch(re, patient) thendo;study prxposn(re, 1, patient);site prxposn(re, 2, patient);patid prxposn(re, 3, patient);end;run;output:In program 4, the regular expression id ‘re’ is created using PRXPARSE function. If the match exists,capture buffers 1, 2, 3 are used to extract study, site and patid from the source (Patient) using PRXPOSNfunction.PRXPARENSyntax: PRXPAREN (regular-expression-id)Regular-expression-id: Returned value from PRXPARSE functionPRXPAREN function returns a value of the largest capture buffer that contains the data of the first match.PRXMATCH, PRXSUBSTR, PRXNEXT or PRXCHANGE functions (routines) are used with PRXPARENtogether. It requires the regular expression id rather than the regular expression.Program 5data want;set have;pattern prxparse (‘/(PAIN) (CERVICAL) (ABDOMINAL)/’);pos prxmatch (pattern, string);if pos then paren prxparen(pattern);run;5

Output:In program 5, ‘PAIN’, ‘CERVICAL’, ‘ABDOMINAL’ are enclosed by parenthesis in the pattern to createcapture buffer location. In the first observation, CERVICAL matches in the second parenthesis of thepattern with pos 1. In the second observation, PAIN matches in the first parenthesiswith pos 20, however, in the third observation, pain does not match in the pattern so that the paren ismissing.CALL ROUTINESSome of the Perl Regular functions have their call routine counterpart. There call routines are similar tothe functions, but they yield more information. We will discuss some the commonly used call routinesnext.CALL PRXCHANGESyntax: CALL PRXCHANGE (regular-expression-id, times, old-string, new-string, resultlength, truncation-value, number-of-changes)Regular-expression-id: Unique numeric regular expression id, times: Number of times the matchingpatterns replaced, old-string: Source text string, new-string: New variable created after matching patternreplaced, result-length: a numeric variable representing the number of characters that are copied into theresult, truncation-value: The Boolean value (1 or 0) whether replacement result is longer than new string.CALL PRXCHANGE () is similar to the PRXCHANGE () function. It, however, takes only regularexpression id as argument and can also create a new variable (new string as in the syntax) afterreplacing the desired pattern.In program 6, we are replacing ‘2018’ by ‘2019’ from the txt variable. ‘newtxt’ variable is created to storethe new string. In program 7, the resultant string will be stored in txt variable without creating any newvariable. We can also change the order of the parts of the string by creating capture groups andreferencing them by the numbers respective to their position in the regular expression pattern precededwith dollar sign within the same expression as shown in program 8. The ‘newtxt’ variable has reversedorder of the original text.Program 6data have;txt 'MWSUG 2018';run;data want;length newtxt 14.;set have;retain re;if n 1 then re prxparse('s/\d /2019/');call prxchange(re, -1, txt, newtxt);keep txt newtxt;6

run;Output:Program 7data want;set have;retain re;if n 1 then re prxparse('s/\d /2019/');call prxchange(re, -1, txt);run;Output:Program 8data want;length newtxt 14.;set have;retain re;if n 1 then re prxparse('s/(\w )\s(\d )/ 2 1/');call prxchange(re, -1, txt, newtxt);keep txt newtxt;run;Output:CALL PRXPOSNSyntax: CALL PRXPOSN (regular-expression-id, capture-buffer, start, length)Regular-expression-id: Unique numeric regular expression id, capture-buffer: A numeric variable forrepresenting the number of capture buffer, start: A numeric variable for the position of the capture buffer,length: A numeric variable for the length of the capture buffer.CALL PRXPOSN () creates the position and length of the capture buffer as variables thus enabling us toextract the desired part of the string using regular SAS functions such as substr() or substrn() later. Inprogram 8, we have word as capture buffer 1 and digits as capture buffer 2. Based on the position and7

length of these capture buffers we can extract the substring representing those capture buffers. CALLPRXPOSN () is used after matching pattern is found by PRXMATCH ().Program 9data want;set have;retain re;if n 1 then re prxparse('/(\w )\s(\d )/');if prxmatch(re, txt) then do;call prxposn(re, 1, pos, len);call prxposn(re, 2, pos1, len1);Conf name substr(txt, pos, len);Conf year substr(txt, pos1, len1);end;keep txt Conf name Conf year;run;Output:CALL PRXNEXTSyntax: CALL PRXNEXT (regular-expression-id, start, stop, source, position, length)Regular-expression-id: Unique numeric regular expression id, start: A numeric variable for the startposition to find the matching pattern, stop: A numeric variable for the position of last character to find thematching pattern, source: The input text, position: A numeric variable where matching pattern is found,length: A numeric variable for the length of string matched by pattern.CALL PRXNEXT () searches for the given pattern of a substring repeatedly yielding the position andlength of the each matching pattern in the string. In program below, we are looking for words followed byspace.Program 10data have;txt 'This is MWSUG 2018';run;data null ;set have;retain re;if n 1 then re prxparse('/\w \s/');start 1;stop length(txt);call prxnext(re, start, stop, txt, pos, len);do while (pos 0);found substr(txt, pos, len);8

put found pos len ;call prxnext(re, start, stop, txt, pos,len);end;run;Log output:CALL PRXSUBSTRSyntax: CALL PRXSUBSTR (regular-expression-id, source, position, length)Regular-expression-id: Unique numeric regular expression id, source: The input text, position: A numericvariable where matching pattern is found, length: A numeric variable for the length of string matched bypattern.CALL PRXSUBSTR () finds the location and length of the matching pattern substring we are interested ina given character string. Two numeric variables position, and length as in the syntax are created. Oncewe know those two parameters, substring can be extracted.Program 11data have;txt 'MWSUG 2018';run;data want;set have;retain re re1;length Conf : 50.;if n 1 thendo;re prxparse('/\w /');re1 prxparse('/\d /');end;call prxsubstr(re, txt, pos, len);call prxsubstr(re1, txt, pos1, len1);if pos 0 then Conf name substr(txt, pos, len);if pos1 0 then Conf year substr(txt, pos1, len1);keep txt Conf :;run;Output:9

CONCLUSIONIn this paper, we introduced Perl Regular Expression in SAS with functions and call routines. We usedrather simple examples to explain them lucidly. Hopefully, this get you started to use them and exploremore in depth. Soon you will find this is powerful.REFERENCES1. Windham, K. Matthew. 2014. Introduction to Regular Expressions in SAS . Cary, NC: SASInstitute Inc.2. Cody, Ron. An Introduction to Perl Regular Expression in SAS 9, Proceedings of the 29th AnnualSAS Users Group International.3. Pless, Richard. An Introduction to Regular Expressions with Examples from Clinical Data,Proceedings of the 29th Annual SAS Users Group International4. SAS Institute. (2010). SAS 9.4 Functions and CALL Routines Reference. Cary, NC: SASInstitute.CONTACT INFORMATIONYour comments, questions, and suggestions are valued and encouraged. Contact the authors at:Kaushal Raj ChaudharyEli Lilly and CompanyLilly Corporate Center, IndianapolisEmail: Chaudhary kaushal raj@lilly.comDhruba R GhimireEli Lilly and CompanyLilly Corporate Center, IndianapolisEmail: ghimire dhruba r@lilly.comSAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks ofSAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and productnames are trademarks of their respective companies.10

basics of the Perl regular expression and various Perl regular functions and call routine such as PRXPARSE(), PRXMATCH(), and CALL PRXCHANGE etc. with examples. The presentation is intended for beginner and intermediate SAS programmers. INTRODUCTION Regular expre

Related Documents:

Review of Basic Perl and Perl Regular Expressions - LMU

Why Perl? Perl is built around regular expressions -REs are good for string processing -Therefore Perl is a good scripting language -Perl is especially popular for CGI scripts Perl makes full use of the power of UNIX Short Perl programs can be very short -"Perl is designed to make the easy jobs easy,

22 Views

11m ago

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

113 Views

9m ago

Perl Scripting Introduction to Perl Scripting, working with Simple ...

Perl can be embedded into web servers to speed up processing by as much as 2000%. Perl's mod_perl allows the Apache web server to embed a Perl interpreter. Perl's DBI package makes web-database integration easy. Perl is Interpreted Perl is an interpreted language, which means that your code can be run as is, without a

6 Views

3m ago

Mastering Regular Expressions - Table of Contents - NTUA

The Perl Way 201 Regular Expressions as a Language Component 202 Perl's Greatest Strength 202 Perl's Greatest Weakness 203. A Chapter, a Chicken, and The Perl Way 204. Page x An Introductory Example: Parsing CSV Text 204 Regular Expressions and The Perl Way 207 Perl Unleashed 208

9 Views

11m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

466 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

326 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

122 Views

9m ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Share a Google Doc in Schoology - fcps.edu

After you have connected your Google Drive to Schoology (directions in a separate handout), another way to share a Doc with students is to use the Google Drive Resource App. To share a Google Doc using the Google Drive Resources App: 1. From the Add Materials drop down menu, select Import from Resources. 2. Select Apps. Then Google Drive .

1y ago

92 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

Google Analytics 101 - Content Jam

Google Analytics 101 201 301 Google Ads 101 201 Google Tag Manager 101 Google Data Studio 101 Google Optimize 101. Welcome Fun Facts: Share . Google Analytics 301 35 Web Property The web property ID is of the form UA-XXXXXX-YY. It's often called the "UA number" since it starts with

1y ago

107 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Perl Regular Expression The Power To Know The PERL In

It looks like you're using an ad-blocker