Tuesday, August 22, 2017

Skills and Knowledge for InfoSec

As a consultant for an incident response firm, the engagements we get are typically fairly fleshed out in terms of being a security or operational incident. Every once in a while, we have calls come in that seem very security focused when described by the customer contact but after arriving onsite they work out to be an operational incident. It can take a lot of experience to really take the problem down to its roots to even make an approach at root cause.

There have been a lot of discussions flying around about various skill levels required for InfoSec jobs. Along with that, many have expressed concerns about the job postings and the requirements that get listed, and I joined in a little while back. Others have made bold statements that InfoSec jobs shouldn’t be entry level jobs since the skills needed are gained through other roles. I am in the middle on that feeling. I have met some very smart people that seem to just ‘get it’ and do well in InfoSec without other prior experience, and I have also met people that have spent 20 years in IT that don’t understand some of the basic concepts. It really goes both ways.

Although I do not hold the opinion that InfoSec is not an entry level job, I do think there is a lot to learn that can be extremely beneficial to a role in InfoSec. I recently went out on an engagement that required an incredibly deep understanding of routing and switching concepts. I am not talking about having the magical skill of being able to calculate subnets in your head (although I was able to do that at one point in my past). I was facing one of those security vs operational incidents I mentioned above.

I spent a lot of time in a network admin role. I took over management of a medium sized business nationwide network. The network had previously been built (incorrectly) by a supposed networking expert. I spent a lot of time understanding what the problems were and addressed each of them as components to an overall problem. The result was a lot of positive comments from end users about the improved speed and reliability of the network. I ended up rebuilding about 90% of that network at another point later during a move from Frame Relay to MPLS. I spent time studying proper network design and function to make sure I was doing things correctly.

I mention this because the recent engagement I went out on involved a few components that at a glance could easily appear like very serious security issues. A proper understanding of networking principals, and along with that the OSI model, was absolutely essential. There were a ton of components that when viewed as a whole would lead down a ton of rabbit holes.

In Incident Response especially, we need to have the ability to view the problem as a whole, but also be able to break the problem down into the various smaller components. That is what an investigative / analytical mind does. Those components often times are not all contributing to the problem. They are often times a symptom or result of another problem. If you don’t have the knowledge to separate those components from the overall problem, then your incident is going to be much more difficult to resolve.

To those of you that are considered entry level:
  1. You can learn on the job, but you need to make sure that you take on a job that will give you that opportunity. Make sure that your role will be involved in technology across the board to get the exposure.
  2. Find a mentor that seems to be the right personality for you. That mentor can guide you to various topics that would be very beneficial to your career in InfoSec.
  3. Understand that there will be jobs that are requiring skills that you don’t posses. The postings don’t always reflect the true picture of what that company is willing to hire.
  4. Ask your mentor for help in applying. Ideally, that mentor will be well connected in the industry and would have already started to expose you to various people around the industry. If that hasn’t happened yet, there might be a reason for it (maybe you aren’t ready), or your mentor might not be supporting you as well as needed.
  5. Show your efforts in learning. Make sure that people understand the time you are putting into improving yourself. This doesn’t mean that you constantly brag about your self, but you can demonstrate your learning in many different ways.

InfoSec can be a tough place to work since we have to know a little about a lot. Embrace your curiosity.

James Habben
@JamesHabben

Thursday, July 20, 2017

Infosec Jobs

I unintentionally started a small storm on #infosec twitter the other day. In that storm I received responses in extremes that I didn’t even know existed. I wanted to give a bit more depth to that thread than what 140 characters can convey.

Let me clear the air a bit first. That was not an attempt to broadcast me searching for a job. I am not ‘on the market’ and no I didn’t apply to any of those jobs that I tweeted about. That was also not an attempt to phish for compliments, ego inflation, or many other interesting things I was accused of in private messages and subtweets.

The Backstory


Here is what happened. As a Senior Consultant for an Incident Response firm, I periodically take a look at other jobs that show up on the market for situational awareness. I have found that the industry titles seem to vary quite a bit as the responsibilities of my Senior title seem to be equivalent to other firms’ Principle, Lead, Manager, or even Director titles. There are Managers that don’t manage, and there are Leads who do. It is pretty spread out.

While looking at one of those postings, I said to myself “Hmmf. I can’t qualify for this job with the same title at a different firm.” I then did a mental inventory of my professional network and started identifying people I have connected with in the past that I could reach out to if I were to go through the application process for that job. That took me to another statement to myself, “How does someone with less time in the industry (and likely less connections) make it past any of these requirements.” Naturally, this mostly applies to folks that are very new to the industry. Then off to twitter I went.

The Response


The responses that I received came in through all different avenues: text, LinkedIn, Twitter, Email, even one on Instagram. I would have probably seen a few on Facebook, however I do not currently possess any credentials to logon to that god-forsaken website.

The responses also varied in their messages. I got a few that definitely do not need repeating, and I'm not exactly sure where the motivation came from behind them. On the other extreme, I received a response from someone (or someones) in my network at every one of those companies that I mentioned in a tweet. Many have very graciously offered to put me in touch with someone to get me hired there, and this would bypass the HR and recruiters to ensure that I was considered. I am thankful for all of those responses since it shows how much of a community I have with those individuals. That wasn’t my intent though, and I hope this post makes that more clear.

The Bypass


What makes me so special? Why did so many people offer to take me around the filters?

I have been in the industry for a really long time
This doesn’t mean I know what I am doing. At the most basic level, it means I have been able to fool enough people for a long enough time to stay employed. While that is not an accurate representation of me as a whole, my length of time makes no other indication.

I work for BigName firm
Yup, I do. I also know a lot of people that are very new to the industry and still have a lot more to learn that also work for big name firms. This also does not show anything about me.

I have followers on Twitter
Ya, I have some. There are plenty of people that have tons more followers than I do. There are also folks that I know who are extremely knowledgable and very good at their jobs with a double digit follower count or no Twitter account at all. It is quite easy to look smart on the internet when I have time to plan and research what I decide to make public.

I have a blog to share my thoughts and research
Now we are getting somewhere. My posts on this blog are a far better representation of who I am than some of the things mentioned above this. Although, it is still quite easy to look smart here because of the time allowed for planning and research. What is more difficult to fake, however, is my communication skills. My writing here is a clear demonstration of my abilities to convey points to a widely varied audience, although the majority of readers here seem to be more technically focused. We are finally looking at something that employers could use in their evaluation of me as a potential candidate.

I have spoken at conferences
Another point that gets more into who I am. We are also getting into an area that is a bit harder to fake. Sure, there is still prep and research involved ahead of the point in which I am delivering my talk, and I certainly do plenty of that. What is nearly impossible to fake is my presence in the room and my authority on a topic. As a bonus, there have even been recorded sessions I have delivered that are available publicly on the internet, and this allows me to provide another demonstration of me to a potential employer.

I know lots of people
This seems to be the most significant point in this list. My time in this industry has put me in contact with a large amount of people. People I have made enough personal connection with to where they care about my wellbeing in terms of being employed. People that have calculated the risk involved, and are still willing to put their reputation and connections on the line to help me.

The Takeaway


Every job I have held has been through some connection. I have not received any jobs where I applied through some web portal. I have tried some of those in the past, and have many times not even received a courtesy rejection letter. My experience is often not categorized as ‘cybersecurity’ or ‘infosec’ depending on which recruiter I talk to.

If you are struggling with breaking into the industry, here are my pointers (which is really an echo of so many others’ great advice as well):

  1. Start a blog and post about stuff. it can be research, thoughts, infosec challenges, or a number of any other topics. If you would like some help getting started on this, please feel free to reach out to me. I enjoy helping motivated people. The information you put in public can give employers more data to consume when you are short on other requirements.
  2. Work on your soft skills. I have a few posts up here about how soft skills can be improved in various ways, and I intend to continue these posts. You need soft skills in this industry if you want to get into the better jobs. Interviews will make or break your job application in the end.
  3. Go out and meet people. This can be virtually or physically. There are quite a number of people who I would consider to be a step above acquaintance in terms of relationship who I have never met. Some I might not even know their real names! Connections will get you in the door.

Last point, I am in no way criticizing the companies that put up those job listings. They have reasons for asking for those requirements, probably because something has bit them in the past. I am not saying they should relax their requirements either. What you as an applicant has to recognize is that the requirements are not always requirements. You need to make meaningful connections to get access to the people that are making the hiring decisions.

I hesitated about initially sending those tweets, and again about writing this post. I am sure another round of hatemail will be heading my way soon. Also, it is not easy to publicly admit that I don’t qualify for so many positions in the industry I have been in for so long. I probably don’t even qualify for the requirements of my current position either. As someone facing the challenges of the hiring process, I hope this can give some comfort and help in your quest.

Good Luck!

James Habben
@JamesHabben

Friday, July 14, 2017

Compile Time Analysis of NotPetya

I had a thought the other day about some of the NotPetya / M.E.Doc (Medoc) initial infection vector that was released last week. I wondered if the attackers had full and complete access to the Medoc network and even source code. Did they have the ability to inject the malicious code into the source repository? Then just sit back while it got included into the build that was released to the customers. Or did they have more restrictive access to, say, the FTP server holding the update files?

Here are a couple references for those that didn’t keep up with the fast moving data related:
ESET wrote about the code that was injected into the ZvitPublishedObjects.DLL file

Cisco Talos wrote about their investigation on behalf of Medoc with access to internal servers

I wanted to see if I had some data to support the level of access the attackers had, even though I don’t have access to the internal systems. It is very possible that Talos has already made this determination and just not made the statement public.

Executable (exe) and Dynamic Link Library (DLL) files have a timestamp in the PE header known as the compile time. This is often used to identify characteristics of attackers, as many times they forget to cover their tracks with this field.

In my inspection, it appears to me that the attackers had full and complete access inside the network with the ability to inject their malicious code into the source repository (whatever Medoc uses).

Single File

The first thing I did was use PEStudio to analyze the affect DLL file. The compile time in this view appeared to be within a normal time period based on other information that has been provided surrounding this attack.



The timestamp is shown in PT since my system has that set. The UTC time is 2017-06-21 14:58:42. I checked a couple other PE files and found it to be within the same time range.

Multiple Files

There are hundreds of files in the Medoc program folder and a large percentage of them are EXE or DLL. It makes no sense to check things one by one when we have the power of automation. Check out AdamB’s post on clustering analysis using compile times.

I took a similar approach and wrote up a quick EnScript to identify PE files and then parse the 4-byte compile time value and dump it out. I used this approach because EnCase allows me to analyze and extract data without having to mount or copy files out. Also, I have been writing EnScripts for a couple years. ;)

The result tells me that the attackers got the code injected at the source. Look in the following screen shot and you won’t find any times that are overlapping. It also appears to have a reasonable amount of time between files based on the sizes.



Unknown

What I can’t answer with my own analysis based on the data available is whether the attackers stole the source code and compiled it on their own. According to Talos, they had access to change the NGINX configurations to have the internal server proxy connections out to a server under their control. Did the attackers inject the code into the internal repository? Did the attackers steal the source code and compile it outside of the Medoc network?


Hope you found this as a nice little interesting tangent from your daily tasks. i did!

James Habben
@JamesHabben

Thursday, July 13, 2017

Soft Skills: Respect

I'm sorry that you interpreted our discussion in the way that you did

I was recently the recipient of this statement. In the best case, it is frustration to hear or read this. In the worst case, it can be depressing and completely demoralizing. Let me provide a few examples with a little elaboration:

As a teacher:
I’m sorry that you didn’t understand what I was saying. It is only my job to talk at you and you are responsible for listening and figuring out the message I was intending to deliver.

As a consultant:
I’m sorry that you didn’t get what I explained. I do this on a daily basis and know so much. I also don’t have time to explain these things to you.

As a friend:
I’m sorry you didn’t interpret that conversation correctly. I was trying to help you to be a better person, but you couldn’t get over yourself enough to hear what I was saying.

As a potential employer:
I’m sorry you understood my statement incorrectly. I hold all the power here and you should be bowing to me to show that you are worthy of a job here.

As a boss:
I’m sorry you misunderstood my instructions. You should have listened better. I know exactly what I was saying and it is your fault you don’t.


Own It

When you decide to take on the task of explaining something, it has become your responsibility to ensure that all of the recipients correctly understand the message. If they don’t, it is YOUR FAULT. This may come as a news flash to some people, but there is no such thing as a mind reader. If you do not explain things in a way that people can understand, you are setting someone (or someones) up for failure.

From Jessica Hyde:
The phrase "I'm sorry" is absolutely meaningless when the onus is then placed back on the party being apologized to in the qualifier. Apologies should be formatted " I am sorry I..." not "I am sorry you...". Argh. The second is just rude.

From Mitch Impey:
the basic rules are valid in every industry and respect is key :)


Treat everyone with respect. Someday it will bite you.

James Habben
@JamesHabben

Wednesday, June 21, 2017

Fileless Application Whitelist Bypass and Powershell Obfuscation

Organizations are making the move to better security with application whitelisting. It is shown in the offensive side of the computer security industry. The frameworks, such as Metasploit, PowerSploit, BeEF and Empire, are making it very easy to build and deploy obfuscated payloads in all sorts of ways. It has become so easy that I am frequently seeing attackers using these techniques on systems that do not employ the added security measures.

There are plenty of solutions to mitigate these types of attacks, however I find they are not always configured properly. Take a read through @subTee’s Twitter feed and GitHub for many of the more creative ways he has shared. The attackers have raised the bar with the use of these techniques. If defenders aren’t deploying appropriate defenses, shame on them.

It Works


I wanted to share with you a few things from a recent engagement. The attacker had installed the backdoor almost a year before detection. They got in through a phishing attack, as in most cases. The detection? A kind and friendly letter from a law enforcement agency that had taken control of the command and control (C2) and was observing traffic to identify victims. The beaconing was surprisingly frequent for as careful as the attacker was in some other areas.

Can you confidently say that your endpoints are safe from these types of attacks? You don’t have to deploy prevention or detection tools for every part of the kill-chain, but you would be best served to have at least one. Or not, YOLO.

Persistence


In order for any malware to be effective, it has to run. I know, it is a revolutionary statement. It is a concept that is missed by some and it is a very critical piece. There are a finite number of places that provide malware the ability to get started after a system has been rebooted. Keep in mind that the user login process is a perfectly acceptable trigger mechanism as well, and there are a finite number of places related there too.

Just like the various creative and new application whitelist bypass techniques, there are creative and new persistence mechanisms found periodically. Adam has posted quite a few of them on his blog. The good news is that the majority of attacks don’t get that creative because they don’t have to.

The run mechanism in this system was HKCU\Software\Microsoft\Windows\CurrentVersion\Run




You can see that the attacker has chosen to use cmd to start mshta. The code following that command is javascript that when run creates an ActiveX object that loads more code from a registry path. So many layers!

Obfuscation


The run mechanism loads in code that has been obfuscated by the attacker. It starts off creating another ActiveX object and then using powershell.exe to interpret the code following. The obfuscation is enough to prevent keyword searches from hitting on some of the known API function involved with these attacks, but it is not a difficult one to break. All you need is a base64 decoder. I recommend that you use a local application based since you never know what kind of thing will be showing up and an online javascript based decoder is susceptible to getting attacked, whether intended by the attacker or not.

The path referenced in the run value and pictured below is HKCU\Software\Licenses. I have blurred some code and value names in an abundance of caution for potential unique identifiers.




Decoding


My preferred tool for decoding this is 010 Editor. It is not free, but it is worth its license cost for so many things.

First thing to do is copy the text inside those quote marks. Don’t include the quotes since that will throw off your base64 decoding.

Now you just create a new document in 010 and use edit > paste from > paste from base64.




Magically you have some evil looking PowerShell code.






Take a look over at this powershell code from @mattifestation and you will hopefully notice that it follows the same flow. It looks like someone simplified the code from the blog post by removing the comments and shortening the names of the variables. Otherwise it is identical.

Payload


Line 2 of the PowerShell code loads the registry data from a different value in the same path. Line 14 then copies the binary data from the variable into the memory space for the process that was created, about 15kb of it. Line 15 then kicks it off, and the binary code takes over.




The binary is a shell code that decompresses a DLL image with aPLib and writes it into the same process space. The resulting DLL has not been identified by any public resources, so I can’t share it with you here. It is very similar to Powersniff and Veil, for those interested in the deeper analysis.

Raise Your Bar


Defenders, the bar has been raised by the attackers. Make sure that you are following suit, or better yet, raising it even higher.

James Habben
@JamesHabben

Wednesday, June 14, 2017

Layers Are Important

We in InfoSec chant it often and for some of us it might even be a daily mantra. “Use Multi-Factor Authentication!” (MFA) Sometimes called Two Factor Authentication (2FA), it adds an additional layer of security to your organization that almost allows for the use of ‘password’ as a password.

If you keep up with the Verizon Data Breach Investigations Report, you should already know that user credentials are the most sought after piece of information over all the incidents. With that kind of data to support a solution, it is still a bit surprising how many organizations out there are exposing services to the public internet without the extra layer(s) of authentication.

More Layers


As great as MFA/2FA is, it will not eliminate all of your problems. I had a troublesome case recently that involved phishing, exposed web services, Remote Access Tools (RAT), stolen credentials, and more. The part that made it really scary was how the attackers were able to figure out the infrastructure enough to almost get VPN access.

The attackers got access to email. Through email they were able to social engineer their way into quite a few areas. One of those areas was how employees obtain the token software and keys for VPN access. Let me restate that with a little more clarity. The attackers requested and got access to VPN tokens used as a part of the MFA/2FA protection.

The process of getting approved for VPN was quite a lengthy one, I know since I had to go through it for remote access as a part of the incident management. After struggling to get myself access, I was astounded at the fact that attackers were able to get so far. It took me quite a while to work through the protections even with the guys on the phone walking me through it all.

Simple Works


You know what stopped the attackers? A registry key. Nothing functional. Just a simple registry key that they inject on company assets. The VPN login process has a full posture check on validating your patches, anti-virus program version, firewall configuration, agent installs, etc. and part of that process includes checking for the existence of a simple registry key.

It might sound silly amidst discussions about all this high tech prevention and machine learning analysis, but sometimes simple works. Don’t overlook the basic protections. They add layers of protection that just might actually be the one piece that saves the day.


James Habben
@JamesHabben

Tuesday, May 30, 2017

Soft Skills: Be Present

On the heels of an industry conference, there are so many emotions running through me. Excitement - to apply new techniques and tools to my work. Frustration - that I didn’t get over my shyness to engage with others that also looked shy. Happiness - that I got to see friends from around the world that would otherwise be logistically difficult. Pride - that I didn’t screw up too badly while talking in my sessions. Exhaustion - that I didn’t get enough sleep because there are only 24 hours in a day. This time for me, it was Enfuse 2017.

In reflection, there was one trend that I noticed quite a lot during the conference. Many people were not being present in their conversations with others. I saw this in hallways between sessions, during mealtimes, and at the various parties. I wasn’t immune either, as I caught myself a couple times as well. There is always a lot going on at conferences, and that makes it especially hard to stay focused on the current engagement. This is one of the best times to either start building or further reinforce a connection with other like-minded folks in the industry. Some call it networking, although I prefer the word connecting because I feel that ‘networking’ doesn’t convey the right meaning.

Networking is when you go to an evening mixer party with a stack of business cards hoping that the numbers will work for you. The larger the number of people that have you card, the more likely you are to get contacted about something. That something might be a sales lead, a job opportunity, or even a free meal. This is not a bad thing.

Connecting is when you spend time to get to know a person. The key difference is how you engage. You focus on the one or few people in the circle and you pay attention to those people. You listen to the conversation and interact.

Some focus points to be present:
  1. Keep your phone in your pocket, purse or bag
  2. Turn your phone alerts off if you are too easily distracted
  3. Look at the person talking, not behind or beside
  4. Point your feet at the person (or group) to help keep your body engaged

Some points to help others be present:
  1. In a networking/connecting event, don’t latch onto one person and prevent them from being able to make other connections
  2. If you notice another person drifting away from you, politely bring it into conversation to either lock in attention or give the opportunity to disengage
  3. Pay attention to your own behavior to ensure you aren’t causing someone to drift
  4. Respect other people’s conversations - don’t barge in and take over

Any other tips you have to be present?

UPDATE: Reading Material


How To Win Friends & Influence People by Dale Carnegie
Part Two, Section 6 - How to Make People Like You Instantly

Key point: Make the other person feel important - and do it sincerely.

This book was originally written in 1936 and is still considered one of the best on this subject. It is referenced by almost every book that presents thoughts and ideas. You will serve yourself well by reading this book, and not just once.

This chapter gives many examples of situations on both sides of this recommendation - making yourself the most import and showing others that they are important. It is a great read with a lot of perspective.


There is nothing more frustrating to a person than to feel like the other person doesn’t value the discussion. Although some people do love to talk for hours regardless of anyone actually listening, I will hold off that discussion for another time. If you don’t want to be there, respectfully disengage. If you want to be there, be there.

James Habben
@JamesHabben

Wednesday, May 10, 2017

Real Self Improvement

This Digital Forensics and Incident Response (DFIR) industry attracts a lot of hard working individuals. The curious nature of people is what has stood out to me the most in all the people that I have talked to. We have an internal drive to find out how things work, and that is not satisfied until we know every part. This is a big part of what makes us stick to a job that can sometimes seem like a battle that could never be won.

The Ongoing Battle


The battle we face is a constant discovery of new artifacts and techniques. These come from both the offense side and the defense side. We don’t all have time to research these on our own, and the community is fortunately very supportive in that there a blogs to detail these findings. The offense finds a new hole and shares with their like minded folks. Then often times the defense finds a way to detect or monitor, and there is more sharing with the like minded community. You only need to see the list of links for a one week period on thisweekin4n6.com to understand the volume and community we have.

Constant Improvement


Because of the community, there are tons of resources to explain all the technical loveliness that we all enjoy. Improving our technical skills is a very achievable task. Reality is that some of the skills I learned to examine Win2k systems are (thankfully) starting to fade. Our tech changes with rapid speed.

What about our non-technical skills? Do you make any effort to improve how you interact with other people? These are often referred to as ‘soft skills’ and you will find them listed, in some form or another, on every job opening.
  • Strong communication skills
  • Ability to convey technical concepts to others
  • Be a team player
  • Comfortable speaking to a crowd
In fact, you might have been a witness to a peer getting a promotion instead of yourself while you have proven multiple times that you are far more technically capable than this peer. Your technical skills were likely not even part of the consideration for that promotion, as the soft skills matter much more when moving up.

Steps


The first step is always to realize. I won’t call this a problem because I don't see it as such. It is a deficiency, and one that can easily be corrected if you will first make that realization.

Next, make a commitment to improve. I mean a real commitment. You won’t make much progress if you don’t take it seriously. Improving soft skills is a whole lot harder than improving your technical skills. You cannot do it alone.

Find someone to help you be accountable. This can be a sibling, friend, classmate, coworker, workout partner, or even someone you just met at a local association meet up. The important thing to find in this person is the ability to be called on the carpet if you are not following through. You know yourself best and what type of person you would be most receptive to.

Find a mentor (or two). This mentor doesn’t have to be someone in the DFIR industry since soft skills are pretty universal. In fact, you might find some extra insight from someone outside your circles. Don’t be afraid to aim high either. For the most part, I have found that people are very willing to give advice all the way up through the C-suite. If there is someone who you admire for a certain trait, go talk to them and find out about the struggle they had to gain that trait. There is an interesting program called infosecmentors.com that might be a good start.

Lastly, don’t waste time. This is one of the only things in this world that we can’t just make more of. We can make more money. We can learn more things. We can drink more whiskey. We can’t take back the hour that we sat listening to that one guy who just wanted to blabber on and on about the things only he thought were important. Be respectful of your time and anyone else you ask for time from. These people will want to see improvements made, or they will start to see time spend with you as a waste. Set an expectation of time with a person and don’t waste it.

More to Come


I have seen and heard a lot of discussion about soft skills in more recent times. I initially wanted to put together another ‘must read book list’, but I decided that I would take a little more time and talk about some various soft skills that we can work on improving together. I will be writing about these in future posts and I will provide information about some of the books that I continue to use in my path of improvement. This can be an intimidating set of skills to improve, and I want to help you do it.


James Habben
@JamesHabben

Monday, April 3, 2017

CCM_RecentlyUsedApps Update on Unicode Strings

The research and development that I did previously for the CCM_RecentlyUsedApps record structure and EnScript carving tool was done against case data I was using during investigations. Unfortunately, I had no data available with any of the string data having been written in Unicode characters. With the thought that Windows has been designed with international languages in mind, I used the UTF8 codepage when reading to hopefully catch any switch to Unicode type characters. Using UTF8 is a very safe alternative to ASCII because it defaults to plain ASCII in the lower ranges, and starts expanding bytes when it gets higher. I have an update, however, because I got a volunteer from twitter to graciously do some testing. Thanks @MattNels for the help!

The Tests


The first test that he ran was using characters that were not in the standard ASCII range. The characters like ä or ö are latin based characters with the umlaut dots above, and they fall within the scope of ASCII when you include both the low and the high ranges.

He created a testing directory on his system, which is under the management of his company’s SCCM deployment services. If you recall from my prior posts on this subject, this artifact is triggered simply by being a member. In this directory, he renamed an executable to include the above mention characters from the high ASCII range. The result show that the record stored those high characters exactly the same as the low range characters. You can see what that looks like in the following image.


The next test he ran was to rename that executable again to something high enough in the Unicode range to get clear of the ASCII characters. He went with “秘密”, which consists of two glyphs 0x79d8 and 0x5bc6. Keeping in mind our CPU architecture, we know that those bytes have to be swapped when written to disk as Unicode characters. The text would translate to four bytes on disk as: d8 79 c6 5b.

Another option, going with my earlier assumption/guess, is for the string to be written using UTF8. The use of UTF8 is pretty common on OS X, and less common on Windows, from my experience. Nevertheless, it would be worth being prepared to see what the bytes would look like if it was UTF8. The above glyphs translate into six bytes on disk, three for each character, but we don’t swap the bytes around like we did with Unicode. Confusing, right? Anyways, those bytes would look like this on disk: E7 A7 98 E5 AF 86.

Drumroll please…

The result was evidence of switching to Unicode. You can immediately recognize it as Unicode because of the 0x00 bytes between the extension “.exe” of that file. If you use a hex->ASCII converter on the Unicode bytes from above (d8 79 c6 5b) you get back “ØyÆ[“, which lines up nicely with the following image.



Now you ask: How do we programmatically determine if the string was written using Unicode or ASCII? Excellent question, and I am glad that you are tracking with me!

Let’s expand the view of this record a bit, and recall the structure of the format from the last post. The strings in Windows are typically followed with a 0x00 (null) byte to indicate where the string data stops. It is referred to as C style strings because this is how the C programming language stores strings in memory. In this record however, the strings were separated byte two 0x00 bytes. Take a close look at the following image of the expanded record with the Unicode string.



Did you spot the indicator? Look again at the byte immediately preceding the highlighted string data, and you will see that it is a 0x01 value. This byte has been a 0x00 value in all of my testing because I didn’t have any strings with Unicode text in them, or at least not to my knowledge. Since executables need to have these latin based extensions, the property will actually look to be ending with three 0x00 bytes. The first of those is actually part of the preceding 'e'. Since this string has been written entirely in Unicode, the null terminating character mentioned just above gets expanded as well. The next byte is then either a 0x00 or 0x01 indicating the codepage for the next string property.

An interesting side note on a situation that Matt ran into, the use of the path “c:\test2\秘密\秘密.exe” for the executable resulted in no records indicating execution. He ran a number of tests surrounding that scenario, and there is something about that path that prevents the recording.

He continued with changing the path to “c:\秘密\秘密.exe”, and the artifact was back. We wanted to get confirmation of that 0x01 indicator byte using another string value. Sure enough, we got it in the following image.




Tool Update


The EnScript that I wrote to carve and parse these records has been updated to properly look for the 0x00 and 0x01 bytes indicating ASCII or Unicode usage. Please reach out to me if you find any problems or have any questions.

Additionally, Matt is adding this artifact to his irFARTpull collection PowerShell. These artifacts can be collected by having PowerShell perform a WMI query against the namespace and class where these records are stored. It should look something like this:
Get-WmiObject -namespace root\ccm\SoftwareMeteringAgent -class CCM_RecentlyUsedApps

Lessons Learned


This is a perfect example of being aware of what your tools are doing behind the scenes and always validating and testing them. Many of the artifacts that we search for and use to show patterns of behavior are detailed through reverse engineering. This process can be helpful, but it can also be a bit blind in not being able to analyze what we don’t have available.

If you aren’t a programmer, you can still contribute with testing, or even just thoughts on possible scenarios of failure. Hopefully the authors of the tools out there will be accepting of the feedback, as it will only provide more benefit for the community.

James Habben
@JamesHabben

Tuesday, March 28, 2017

Windows Prefetch: Tech Details of New Research in Section A & B

I wrote previously with an overview about the research into Windows prefetch I have been working on for years. This post will be getting more into the technical details of what I know to help others take the baton and get us all a better understanding of these files and the windows prefetch system.

I will be using my fork of the Windows-Prefetch-Parser to display the outputs in parsing this data. Some of the trace files I use below are public, but I didn't have certain characteristics in my generated sample files to show all the scenarios.

Section A Records


I will just start off with a table of properties for the section A records, referred to as the file metrics. The records are different sizes depending on the version. I have been working with the newer version (winVista+) and it has just a tad more info than the xp version.

Section A Version 17 format (4 byte records)
0trace chain starting index id
4total count of trace chains in section B
8offset in section C to filename
12number of characters in section C string
16flags

Section A Version 23 format (4 byte records, except noted)
0trace chain starting index id
4total count of trace chains in section B
8count of blocks that should be prefetched
12offset in section C to filename
16number of characters in section C string
20flags
24 (6)$MFT record id
30 (2)$MFT record sequence update

As you can see between the tables, the records grew a bit starting with winVista to include a bit more data. The biggest difference is in the $MFT record references. Very handy to know the record number and the sequence update to be able to track down previous instances of files in $Log or $UsnJrnl records. The other added field is a count of blocks to be prefetched. There is a flag setting in the trace chain records that allows the program to specify if a block (or group) should be pulled fresh every time, somewhat like a web browser.

The flag values seem to be consistent between the two versions of files. This is an area that applies a general setting to all of the blocks (section B) loaded from the referenced file, but I have seen times where the blocks in section B were assigned a different flag value. Mostly, they line up. Here are the flag values

Flag values (integer bytes have been flipped from disk)
0x0200    X    blocks (section B) will be loaded into executable memory sections
0x0002    R    blocks (section B) will be loaded as resources, non-executable
0x0001    D    blocks should not be prefetched

You can see these properties and the associated filenames in the output below. You will notice that the $MFT has been marked as one that shouldn’t be prefetched, which makes a lot of sense to not have stale data there. The other thing is that there are a couple DLL files that are referenced with XR because they are being requested to provide both executable code and non-executable resources.


Section B Records


This section has records that are much smaller, but there is so much more going on. The most exciting part to me is the bitfields that show a record of usage over the last eight program runs. You have probably seen these bitfields printed next to the file resource list of the python output when running the tool, but that data is not associated with either the filename in section C or the file metrics records in section A. These bitfields are actually tracking each of the block clusters in section B, so the output is actually a calculated value combined from all associated section B records. I will get to that later. Let’s build that property offset table first. These records have stayed the same over all versions of prefetch so far.

Section B record format
0 (4)next trace record number (-1 if last block in chain)
4 (4)memory block offset
8 (1)Flags1
9 (1)Flags2
10 (1)usage bitfield
11 (1)prefetched bitfield

The records in this section typically point to clusters of 8 512 blocks that are loaded from the file on disk. Most of the time, you will find the block offset property walking up in values of 8. It isn’t a requirement though, so you will find intervals smaller than that as well.

Here is an example of these records walking by 8.


Here is an example of one record jumping in after 2.


Here is an example of a couple sequential records, jumping only by 1.



I broke the two flag fields up early on just to be able to determine what was going on with each of them. What I found out was that Flags2 is always a value of 1. I haven’t seen this change ever. Without a change, it is very difficult to determine the meaning of this value and field. I have kept it separate still because of the no change.

The Flags1 field is similar to the Flags field that is found in the section A records. It holds values for the same purposes (XRD), though the number values representing those properties aren’t necessarily the same. It also has a property that forces a block cluster to be prefetched as long as it has been used at least once in the last eight runs. I will get into more later about the patterns of prefetching that I have observed, but for now let’s build the table for the properties and their values.

0x02    X    blocks are loaded as executable
0x04    R    blocks are loaded as resources
0x08    F    blocks are forced to be prefetched
0x01    D    blocks will not be prefetched

Now I get to show my favorite part: the bitfields for usage and prefetch. They are each single byte values that hold eight slots in the form of bits. Every time the parent program executes, the bits are all shifted to the left. If this block cluster is used or fetched, the right most bit gets a 1; otherwise it remains 0. When a block cluster usage bitfield ends up with all 0, that block record is removed and the chain is resettled without it.

Imagine yourself sitting in front of a scrabble tile holder. It is has the capacity to hold only eight tiles, and it is currently filled with all 0 tiles. Each time the program runs and that block cluster is used, you put a 1 tile on from the right side. If the program runs and the block cluster is not used, then you place a 0 tile. Either way, you are going to push a tile off the left side because it doesn’t have enough room to hold that ninth tile. That tile is now gone and forgotten.

Prefetch Patterns


The patterns listed below occur in section B since this is where the two bitfields are housed. Remember that these are for block clusters and not for entire files. Here are some various scenarios around the patterns that I have seen. The assumption is neither the D or F property assigned unless specified. Also, none of these are guaranteed, just that I have observed them and noted the pattern at some point.

Block with the F (force prefetch) property assigned, after 1 use on 8th run:
10000000    11111111

Block with the D (don’t prefetch) property assigned, after a few uses:
01001011    00000000

Block that is generally used, but missed on one:
11011111    11111111

Block on first use:
00000001    00000000

Block on second run, single use:
00000010    00000001

Block on third run, single use:
00000100    00000011

Block on fourth run, single use:
00001000    00000110

Block used every other run:
01010101    00111111

Block used multiple times, then not:
01110000    00111111

Block used multiple times, but only one use showing:
10000000    11100000

More Work


I am excited to see what else can be learned about these files. My hope is that some of you take this data to test it and break it. You don’t have to be the best DFIR person out there to do that. All you need is that drive to learn.

James Habben
@JamesHabben

Sunday, March 26, 2017

Windows Prefetch: Overview of New Research in Sections A & B


The data stored in Prefetch trace files (those with a .pf extension) is a topic discussed quite a bit in digital forensics and incident response, and for good reason. It provides a great record of the executables that have been used, and Windows is configured to store them by default for workstation systems. In this article, I am going to add just a little bit more to the type of information that we can glean from one of these trace files.

File Format Review


The file format of Prefetch trace files has changed a bit over the years and those changes have generally included more information for us to take advantage of in our analysis. In Windows 10 for example, we were thrown a curve ball in that the prefetch trace files are now being stored compressed, for the most part.

The image below shows just the top portion of the trace files. The header and file information sections have been the recipient of the most version changes over the years. The sections following are labeled with letters as well as names according to Joachim’s document on the prefetch trace file format. The document does state that the name of section B is only based on what is known to this point, so it might change in the future. I hope that image isn’t too offensive. Drawing graphics is not a specialty of mine.




New Information, More Work


The information that I am writing about here is the result of many drawn out years and noncontiguous time of research. I have spent way too much time in IDA trying to analyze kernel level code (probably should just bite the bullet and learn WinDbg) and even more time watching patterns emerge as I stare deeply into the trace file contents. It is not fully baked, so I am hoping that what I explain here can lead to others, smarter than me, to run with this even further. I think there is more exciting things to be discovered still. I have added code to my fork of the windows-prefetch-parser python module, which I forked a while back to add SQLite output, and I will get a pull request into the main project in short time. This code adds just a bit of extra information in the standard display output, but there is also a -v option to get a full dump of the record parsing. (warning, lots of data)

File Usage - When

The first and major thing that I have determined is that we can get additional information about the files used (section C) in that we can specify which of the last 8 program executions took advantage of each file. We have to combine data from all three sections (A, B, and C) in order to get this more complete picture, something that the windows prefetcher refers to as a scenario. This can also help to explain why files can show up in trace files and randomly disappear some time later. Take a look at this image for a second.


This trace file is for Programmer’s Notepad (pn.exe) and was executed on a Windows 8 virtual machine. I created several small, unique text files to have distinct records for each program execution. I used the command line to execute pn.exe while passing it the name of each of those text files. I piped the output into grep to minimize the display data for easier understanding here.

There are two groups of 8 digits, and these are a bitfield. The left group represents the program triggering a page fault (soft or hard) to request data from the file. The right group represents the prefetcher doing a proactive grab of the data from that file, as this is the whole point to have data ready for the soft fault and to prevent the much more costly hard fault. In typical binary representation, a zero is false and a one is true. Each time the program is executed, these fields  are bitshifted to the left. This makes the right side the most recent execution and each column working left is the scenario prior, going up to eight total.

If you focus on an imaginary single file being used by an imaginary program, the bitfield would look like this over eight runs.
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000

What happens after eight runs? I am glad you asked. If the value of this bitfield ends up being all zero’s, the file is removed from section C, and all associated records are removed from sections A and B. Interestingly, the file is not removed from the layout.ini file that sits beside all these trace files; not immediately, from what I have been able to determine.

If the file gets used again before that 1 gets pushed out, then the sections referencing that file will remain in the trace file.
00000001
00000010
00000100
00001000
00010001
00100010
01000100
10001000
00010000
00100001
01000010
10000100
00001000
etc.

File Usage - How


The second part, and the one that needs more research, is how this file was used by the executing program. There are some flag fields in both section A and B that provide a few values that have stuck out to me. There are other values that I have observed in these flag fields as well, but I have not been able to make a full determination about their designation yet.

The flag field that I have focused on is housed in section A. The three values that I have found purpose behind seem to represent 1) if a file was used to import executable code, 2) if the file was used just to reference some data, perhaps strings or constants, and 3) if the file was requested to not be prefetched. You will mostly see DLL files with the executable flag, although there are some that are referenced as a resource. You will find most of the other files being used as a resource.

In the output of windowsprefetch, I have indicated these properties as follows:
X    Executable code
R    Resource data
D    Don’t Prefetch

See some examples of these properties in the output below from pn.exe.



More Tech to Follow


I am going to stop this post here because I wanted this to be more of a higher level overview about the ways we can use these properties. I will be writing another blog post that gets into a little more gory detail of the records for those that might be interested.

Please help the community in this by testing the tool and the data that I am presenting here. Samples are in the GitHub repo. This has all been my own research, and we need to validate my findings or correct my mistakes. Take a few minutes to explore some of your system’s prefetch files.

You can comment below, DM me on twitter, or email me first@last.net if you have feedback. Thanks for reading!

James Habben
@JamesHabben

Friday, March 17, 2017

BsidesSLC Experience and Offer to Help

I was given the privilege of speaking at the BsidesSLC conference this month, and it was a very enjoyable conference for me. The people in the SLC area are very welcoming and the crew that puts the conference on did an amazing job. The name of the conference is changing for next year, but the format is staying pretty much the same. If you have the ability to attend next year, I would highly encourage you to do so.


Here are some points that I picked up during my attendance:

Bryce talked about a well known issue of developers posting secrets to code repositories such as GitHub or BitBucket. The funniest part of this is that these developers realize their mistake and commit a revision to remove. What happens to the previous commit? Exactly! This same mistake is made by even more developers when you include other cloud technologies like S3 storage. That Wordpress vulnerability that allows file injection can lead to a complete meltdown when the attacker accesses all of your data that is stored inside S3 or other systems. Keep your secrets secret.

Bri explained the challenges in compromising Industrial Control System (ICS) devices. Getting the highest level of privilege on a system doesn’t automatically mean the compromise of the connected devices. There is a secondary payload required to further infiltrate and that secondary payload requires expert knowledge of the ICS being targeted. We aren’t yet at the point of having commoditized malware for ICS.

JC walked us through how he operates tabletop exercises for his clients. There wasn’t anything new for me in this one, but it was a great reassurance that I have been facilitating a quality exercise for all of my clients. I think the attendees should takeaway that there really needs to be a externally hired facilitator to run some of their exercises to work around any of the internal politics or bias. Mr. ‘Junior Infosec' may not feel comfortable calling out the CEO for a wrong answer, but I am happy to do it.

Chad gave us an earful of all the various ways that Windows credentials can be picked and harvested by attackers, both on the wire and on the disk. He even provided a handout with all the additional notes he talked about. This is a very important topic to be aware of because the DBIR has consistently shown that credentials are the most targeted in incidents and breaches. Defenders need to be aware of every possibility of credential compromise in order to put safeguards in place.

Lastly, Lesley gave an inspiring talk about how we as industry have a collective skill to land a plane while not being professional pilots (at least most of us). She went through a great demonstration showing how every person (not an exaggeration) can contribute in some way to improving the security field. We just have to look at ourselves and identify the skills we have and offer the help to others that are trying to learn. No one in this field is an expert at everything, even though its hard to believe with the reputation following many people. We all have skills, and we all have something we want to learn.

My Offer to Help


I consistently see advice given to new folks in the field, or those trying to get into the field, that blogging is one of the best ways. This allows you to demonstrate the skills you have and gives you a reference on your resume. You don’t have to post about the latest research on the newest malware. Focus on the skills you have that you can share with others, or document your journey of learning a new skill. Communication is a critical skill in this industry and I challenge you to find a job listing that doesn’t ask for someone with ‘good communication skills’ or the ‘ability to explain technical concepts’. Blogging is pure demonstration of that ability.

I want to put the offer out there to anyone who wants to get into blogging but is too shy to get it rolling. If you enjoy my style and reading my posts, then reach out to me so that I can help you. I can help you to organize your thoughts into a post that flows. I can help you come up with topics. I can help you improve on your writing skills. I am even happy to have you post on this blog.

My DMs are open on twitter, and my email is first@last.net. Your move.

James Habben
@JamesHabben

Tuesday, March 14, 2017

CCM_RecentlyUsedApps Properties & Forensics

UPDATE 2017-04-03: Unicode strings are used when needed. See the update post

You can uncover an artifact from the deepest and darkest depths of an operating system and build a tool to rip it apart for analysis, but if everybody stares at it with a confused look on their faces it won’t gain acceptance and no one will use this new thing you did. Something about forensics, Daubert, Frye, etc., not to mention plain reasoning.

With that said, this post is a followup to my previous post about the Python and EnScript carving tools that can be used to analyze data from the WMI repository database, and more specifically, the class CCM_RecentlyUsedApps that is contained within. That post was about the structure of the records, and how to locate and then parse the meaningful data into property lists. This post is about what these properties mean and how they can be used.

Header Data


The indexing of the WMI repository uses hashes to better store and locate the various namespaces and classes in the file. These hashes are placed at the beginning of each of these records. The way the hashes are calculated are discussed in the previous post.

There are two date properties that are part of the record header, in the Microsoft FileTime format that occupies 8 bytes each. Both of these dates are stored in UTC. With these dates being part of the record header, they will be found on records in all types of classes, not just those being used with the CCM_RecentlyUsedApps tracking.

Timestamp1 indicates the last date the system had some sort of checkin or assessment from the SCCM server. It will be the same for all actively allocated records. You will very likely find previous dates on some records when using the carving method since there are records that get deallocated but not overwritten. The systems that I have analyzed these artifacts from have all had roughly a week between the various dates. I suspect this is a configuration setting that an SCCM admin would be able to modify.

Timestamp2 seems to indicate when the system was last initiated to join SCCM. This will be the same for all records, even with the carving method. The only reason this date would change on some records, was if the system was removed from being managed by SCCM and then joined again. This date has always lined up well, in my research and investigations, with other artifacts that support an action of joining an SCCM management group, such as services being created or drivers installed.

Numeric Record Data


There are 3 numeric properties stored in the record data: Filesize, ProductLanguage, and LaunchCount. None of these are going to sound any alarms on their own, but they can help paint the picture when combined with the rest of the properties.

Filesize is a four byte field that tracks the bytes of the executable for the record. Depending on if the developer used a signed type or unsigned, four bytes has a max value of 4GiB (unsigned) or 2GiB (signed). If you have a bunch of Adobe products on your systems, you might run into these size limitations, but every other program should be just fine for now. This field is end capped by other properties/offsets on both sides, so it’s not a question of reverse engineering (guessing) as how big it is. It is four bytes.

ProductLanguage is a four byte field that holds an integer related to the language designed by the developer. This sounds like a good possibility for filtering, but I have found tons of legitimate programs that have 0 for this field. I regularly see both 0 and 1033 on the systems I have analyzed.

LaunchCount is a four byte field that holds an integer representing the number of times this executable has been run on this system. I have seen programs with five digit decimal numbers on some systems! This won’t be common because one of the string fields tracked is the version of the binary. New version number, completely new record. Unlike Windows Prefetch, you won’t find a ton of articles written by idiots telling the world to delete all data associated with CCM_RecentlyUsedApps. Give it a couple months.

String Record Data


I don’t want to list out every one of the string properties here since many of them are really quite self-explanatory. I want to touch on a few that would either be very helpful or have some caveats that go with them. If any one of these properties were to change value for a binary, there will be a whole new record created for the new data.

ExplorerFilename is the name of the binary as it is seen by the filesystem. If this name changes, there will be a new record as stated above.

OriginalFilename is one of many strings that come from the properties contained in the binary data, usually towards the end of the file. You might think that comparing this field to the ExplorerFilename would be a good way of filtering your data down to those suspicious binaries, and I would applaud you for the thought process of getting there (that is getting into the threat hunting mindset). The reality is that there are a ton of legitimate programs distributed through legitimate channels that were compiled into a different filename than how it was packaged up before sending to you. (Slack, I am looking at you) It is one method of trying to digest this data that can lead to good findings, but it isn’t going to do your job for you. Many of the native Windows binaries have a ‘.mui’ appended after the ‘.exe’ in this field, just to throw us all off a bit.

LastUsedTime is a date time value stored as a string. The format is yyyyMMddHHmmss.000000+000, and I have not seen any timezones applied on any of the systems I have analyzed. There is a caveat with this property. The time recorded is the last time the program was running. Effectively, it is the last time the program was shutdown. I have confirmed this many times by multiple sources. One source is the log file created from our automated collection script, and I am able to lineup this timestamp with the end of the tool every time.

FilePropertiesHash is a great property when it exists. I haven’t been able to determine why, but some systems have a value filled in while others don’t. It is consistent within an environment in that all systems from a given customer either have it or don't have it. The hash is in SHA1, and it is a hash of the binary data.

SoftwarePropertiesHash is a hash of something, but it is not the binary data. Also, it isn’t always there, though it tends to show up if the ‘msi’ prefix fields have values. I have had many records that have the FilePropertiesHash, but the SoftwarePropertiesHash is empty.

FolderPath has been an accurate property telling where the binary existed when it was executed. If the binary is moved, this record will become stale as a new one is created with the new path.

LastUserName tracks what appears to be the user account that was used to execute. I would still like to validate this a bit further, however. Every record that I have identified as critical to a case has been backed up by other artifacts showing this username executed the file. It may be the last user to have authenticated on the system before this executable was run, but I have not run into that scenario in order to dis/prove. Please let me know if you find this means otherwise.

Analysis Considerations


A few of my thoughts about analyzing this data. Please share your own.

Blanks

Many of the properties come from the section of the executable that stores properties about the program: CompanyName, FileDescription, FileVersion, etc. You might think that malware authors are lazy and leave these fields empty because they serve no purpose, and you would be correct part of the time. Looking for blanks can be one method, but it is not a guarantee. A few points:

Don’t assume all malware authors are lazy
Some malware these fields filled with legitimate looking data - #opsec
Remember that many attackers use the ‘Live off the land’ method of using what exists on the system
Many legitimate programs will leave these fields empty

Some legitimate programs I have run across in my analysis of this CCM_RecentlyUsedApps data that have blank fields are pretty surprising. These programs have been in categories across the board. I thought about providing a list of these executable names, but some are a bit sensitive. Instead, here is a list of some categorically.

Python binaries
Anti-virus main and secondary tools
Point Of Sale main and updater programs
Tons of DFIR tools
Java
Google Chrome secondary tools
Driver installers

On the opposite side, I have seen some advanced malware use these properties very strategically. There was one that even properly used the FileVersion field. I found records from different systems and places that showed 3 incriminating versions that were active on the network.

Name or Path

I noted this above, but keep in mind if after running an executable at least once that even a single character changes for either the name or path, the previous record is alienated and a new one is created. With the assumption that no data and only the name or path changed, the FilePropertiesHash can be used to find identical binaries.

Large Scale Aggregated Data

I designed the EnScript to be run against any number of systems and output the results to a single file. This gives the investigator the ability to perform analysis against the data in aggregation. Importing this data into a relational database (MSSQL, MySQL, SQLite, etc) gives a huge advantage when analyzing this data at scale. Outliers can be quickly identified through a number of different techniques.

For example, a simple ‘group by’ query that counts the number of systems that each executable has been run on can really jump start the findings.
Select distinct ExplorerFilename, FolderPath, count(EvFilename) as SystemCount
From tablename
group by ExplorerFilename, FolderPath
order by SystemCount


Excel pivot tables can provide similar analysis, though not quite as flexible.



I hope this is able to help some of you track things down a bit faster. We as an industry can use any help we can get to reduce the time between detection and remediation.

James Habben
@JamesHabben

Tuesday, February 28, 2017

Secret Archives of Execution Evidence: CCM_RecentlyUsedApps

UPDATE 2017-04-03: Unicode strings are used when needed. See the update post.

I seem to be running into more and more systems that have Windows Prefetch disabled for one reason or another. It is especially frustrating for me as a consultant since I cannot make the changes necessary to enforce the creation of the trace files nor can I implement any kind of central logging. Without this digital forensic artifact, it becomes increasingly difficult to build out a timeline of events across all the systems involved in an incident response.

One of the evidence sources that has shown itself over and over comes from a connection with a Microsoft SCCM server. SCCM has the ability to collect inventory data from many sources, and tracking executables launching is one. This feature isn’t turned on by default to have the SCCM server collect this data; however, the logging occurs on the endpoints regardless of the settings that are configured on the server.

If you search for CCM_RecentlyUsedApps, you will find tons of articles about configuring SCCM to collect this data or how to perform queries to extract the collected data. If you have the ability to push this in your organization, I say do it! If you can’t, then read on so I can show you how to take advantage of this data anyways.

Data Source


The records holding the information behind CCM_RecentlyUsedApps are stored in the collection of files that make up the database behind WMI. The locations are consistent from Windows XP through Windows 10, and you will find them here:
c:\windows\system32\wbem\repository\
c:\windows\system32\wbem\repository\fs\


I have even seen some systems that have what appears to be an old version of the WMI database. It seems to roll like the Windows Registry controlset keys. When the rebuild process kicks off, a new version of the database is built and it does not carry the previous information with it. I have seen up to 003, but it would likely go further. The previous versions look like this:
c:\windows\system32\wbem\repository.001\
c:\windows\system32\wbem\repository.001\fs\


This specific artifact was a very critical piece in a previous case. It allowed us to narrow the time window of the compromise to be much more specific. Even a single day of exposure can make a big difference in the fines against the victim company during a PCI Forensic Investigation (PFI).

You will see a handful of files in these locations. They are all used to link all the various records together to properly parse these. The guys at FireEye did some work on reverse engineering this database and released a python script to extract all of the available classes and namespaces. You can find their tool here:
https://github.com/fireeye/flare-wmi/tree/master/python-cim

Using this script, you can extract this data using these parameters:
Namespace: root\ccm\SoftwareMeteringAgent
Class: CCM_RecentlyUsedApps

This script was very helpful to me in a number of previous cases, although I have to mention that it is a bit of a pain to get installed properly. The other trouble that I ran into with this script, by no fault of the FireEye team, is that it can only parse the namespaces from the database if the data is not ‘corrupted’. I have found that imaging a live system can cause ‘corruption’ almost half of the time. It is frustrating to know that there are Indicator Of Compromise (IOC) hits inside that data blob, but the data won’t allow for the parsing.

Different Approach


As I manually looked over those seemingly lost IOC hits, I started to recognize patterns surrounding the hits. The fields holding all the property data seemed to be in the same order for all of the records of a certain system that I was reviewing at the time. I then pulled up a few systems with different OS’s from previous cases and found the same structure. YES!! The perfect setup for carving. Time to reverse engineer the record format.

The index uses a hash value in tracking and sorting structures that I won’t bore you with here. I mention though, because this hash is the piece that we will use to find these records. WinXP uses MD5 and newer uses SHA256. The hash in these records is generated from the class name CCM_RecentlyUsedApps, only the text needs to be upper cased as CCM_RECENTLYUSEDAPPS, and then converted to Unicode C\x00C\x00M\x00_\x00R\x00… (and you get the point).
WinXP MD5:
6FA62F462BEF740F820D72D9250D743C
WinVista+ SHA256:
7C261551B264D35E30A7FA29C75283DAE04BBA71DBE8F5E553F7AD381B406DD8

These hashes are what start the records. They are stored in Unicode themselves, for some reason. 128 bytes for the SHA256 and 64 bytes for the MD5.



The next 16 bytes following the hash are two 8 byte FileTimes.



After that will be 2 bytes to tell you the size of the data portion of this record. I have not seen any records using more than 2 bytes and the max size of 2 bytes is either 65,535 unsigned or 32,767 signed. Either of those provide plenty of space for this data, so I wouldn't expect it to expand for size purposes. The data portion of the record includes these 2 bytes.



You can see on the right in the screenshot above that the size of the data is 432. You can then see at the bottom that I have highlighted 432 bytes (Sel 432 [1B0h]). You can also see another ‘7C261…’ starting immediately after my selection, although don’t let this fool you into thinking that these records will always be contiguous.

From here, the data is broken into 2 sections. The first section consists of various 4 byte fields with some being offsets and others being property values. The second section contains all the string based property values separated by double 0x00 bytes.

There are 3 values we can extract from the number section that are helpful.
Filesize
Offsets: Vista 178d (128+16+34), XP 114d (64+16+34)

ProductLanguage
Offsets: Vista 194d (128+16+50), XP 130d (64+16+50)

LaunchCount
Offsets: Vista 202d (128+16+58), XP 138d (64+16+58)

The string section always starts with ‘CCM_RecentlyUsedApps’ and is followed by the double 0x00 separator. If there are 4 bytes of 0x00 following, then the next string field is null. If there are 6 bytes of 0x00, then the next 2 string fields are null. Follow the pattern?

The string properties are listed in the following order:
ClassName (always “CCM_RecentlyUsedApps”)
AdditionalProductCodes
CompanyName
ExplorerFilename
FileDescription
FilePropertiesHash
FileVersion
FolderPath
LastUsedTime
LastUsername
MsiDisplayName
MsiPublisher
MsiVersion
OriginalFilename
ProductCode
ProductName
ProductVersion
SoftwarePropertiesHash

There will only be a single 0x00 at the very end of the record. Wasn’t that easy?

New Python Tool


After I determined these structures, I was chatting with Willi Ballenthin since he was involved in the research of the database structure. He said something like “that tool sounds pretty neat” and then followed up saying “possibly similar to this” and pointed me to a blog post by David Pany at FireEye.
https://www.fireeye.com/blog/threat-research/2016/12/do_you_see_what_icc.html

Sure enough, David beat me to it with a python script to search for the classname hashes and parse the record structure. The good news is that we arrived at the same basic approach and record structures. Validation is always nice. His python script is on GitHub here:
https://github.com/davidpany/WMI_Forensics/blob/master/CCM_RUA_Finder.py

I have had some trouble running this python script against my systems, but I haven’t spent the time to determine the cause. The output is a CSV file, but I don’t have any screenshots to show because of the errors I ran into.

New EnScript Tool


I decided to write this approach in EnScript. My cases have involved upwards of 500 systems for analysis. Using a python based approach would force me to either extract all those files, or use a mounting or parsing solution to expose the files. By using EnScript in EnCase v7 or v8, I can run the EnScript over all system images with one pass. I was able to successfully do this in testing on a recent case with 73 systems in the same case. EnCase proved to be a powerful tool in this specific scenario.

The EnScript starts off with a GUI to give you the option of running against all files in the case or a smaller subset designated by a blue check or tag selection.



I found records existing in OBJECTS.DATA and INDEX.BTR files. Some seem to be in areas of the file that have been deallocated from the active records of the database. Additionally, I have found quite a large number of records in the PAGEFILE.SYS file as well. You will see a selection option in the GUI for these common filenames.

The output of this EnScript is a CSV file. It includes a few columns in addition to the properties that were parsed from the records: evidence filename to indicate the system source, item path to show which file it was found in, and file offset to manually validate the data later if needed.

I encourage you to use Excel’s data deduplication function since I ran into a number of bugs in EnCase trying to make this EnScript work. There are some hacky workarounds in the code currently. Dedupe on all columns except item path and file offset. This will remove dupes that are found in both pagefile.sys and objects.data files.

I suspect we might be able to pull some of these records from unallocated clusters, but I haven’t found any there yet. Please let me know if you do!

You can grab the latest version of the EnScript on GitHub:
https://github.com/JamesHabben/ccm-rua-enscript

See the followup post about the forensic meanings.

James Habben
@JamesHabben