privacysavvy

privacysavvy

Saturday, September 2, 2023

[New post] What Is The Origin Of Passwords Submitted To Honeypots?, (Sat, Sep 2nd)

Site logo image Malware Devil posted: "We use passwords just about everywhere in our daily lives. It's difficult to think of an online service where we don't have a need to enter some kind of credentials to access our content. DShield honeypots collect a variety of data, including passwords, t" Malware Devil

What Is The Origin Of Passwords Submitted To Honeypots?, (Sat, Sep 2nd)

Malware Devil

Sep 1

We use passwords just about everywhere in our daily lives. It's difficult to think of an online service where we don't have a need to enter some kind of credentials to access our content. DShield honeypots collect a variety of data, including passwords, that are submitted from SSH and telnet attacks.

Figure 1: Snapshot on 9/1/2023 of DShield submitted usernames and passwords [1]

The passwords in the above image are ones that are very common week passwords. This is only a small sample of the passwords submitted to honeypots and it made me curious whether there was any particular origin of the submitted passwords:

Default system passwords
Data breach passwords
Randomly generated passwords [2]

As a starting point, I complared the almost 250,000 unique passwords submitted to my honeypot with some publicly available sources:

Rockyou [3]
HaveIBeenPwned Passwords [4]

 

Extracting Honeypot Passwords

There are many ways to get the passwords out of a DShield honeypot, especially if external logging of the cowrie data is set up. The method used in this case was to pull it out of the local JSON logs I regularly archive.

# read all cowrie JSON logs
# cat /logs/cowrie.json.*
#
# select logs with the .password key present
# jq 'select(.password)'
#
# query the value in the password key and return in raw format (without surrounding quotes)
# jq -r .password
#
# sort the values alphabetically
# sort
#
# return only unique values and output to a text file
# uniq> 2023-08-15_unique_passwords_raw.txt

cat /logs/cowrie.json.* | jq 'select(.password)' | jq -r .password | sort | uniq> 2023-08-15_unique_passwords_raw.txt

 

Comparing Password Data

The data available from the three sources came in different formats and and needed to be converted for comparison.

Data Source
Starting Format
Converted format

Honeypot passwords
utf-8 strings
SHA1 Hash

Rockyou passwords
latin-1 strings
SHA1 Hash

HaveIBeenPwned passwords
SHA1 hash with frequency count
SHA1 Hash

Since a hash cannot be reversed, hashing the passwords supplied to the honeypot and from the rockyou was performed. This actually made the process easy since little processing was needed for the HaveIBeenPwned password list, which was around 36GB in size.

Seconds to process:
1800.039571

Total honeypot hashes:
 247799

Total HaveIBeenPwned hashes:
 865964448

Total RockYou hashes:
 14343758

RockYou Matched Hashes:
 78235

RockYou ONLY Matched Hashes:
 15

HaveIBeenPwned Matched Hashes:
 164048

Percentage of honeypot passwords found in HaveIBeenPwned breach data:
 66.2%

Percentage of honeypot passwords found in RockYou data:
 31.57%

Percentage of honeypot passwords found ONLY in RockYou data:
 0.01%

Average processing pace:
 481080.78 hashes per second

Something learned from this process was that using a Python set() is much faster than using a Python list[]. Nothing makes this much more evident than processing a 36GB text file. Since these values were unique within each data set, a Python set() worked very well.

Also, latin-1 strings were used with the Rockyou list due issues with attempting utf-8 encoding.

 

Data Comparisons

Looking frequently at cowrie attacks regularly from the DShield honeypot, I knew that there was going to be some unusual results. Rather than filter those out ahead of time, I decided to look at the information visually by comparing password length frequencies.

Figure 2: Password length frequencies from honeypot submissions

The data shows that the most common password length is 8 characters, but there are a lot of passwords with much greater length and lower frequencies. The longest password that had a match in the HaveIBeenPwned data was 48 characters.

Figure 3: Longest password matching HaveIBeenPwned data was 48 characters in length

So, what are these longer passwords? In most cases, the data is most likely not a password, but another part of an attack such as a terminal command or even data meant to be sent to another protocol, such as HTTP.

Figure 4: Examples of data that were not likely meant for password submissions

As the passwords get longer, these commands stand out even more. When filtering out passwords longer than 48 characters, there is not a large difference in the match percentages. It turns out that there are only a few hundred of these passwords out of almost 250,000.

 
Count
Percentage

HaveIBeen Pwned Matches
164041
66.28%

RockYou Matches
78233
31.61%

Total Hashes
247482
 

 

Passwords Without Matches

Approximately 2/3 of the passwords used to attack my honeypot were available in HaveIBeenPwned password data. What about the other 1/3 of the passwords? I pulled out one specific password example since it had no matches within the breach data used, but was also one of the top 20 passwords attempted this year [5].

Figure 5: Password example with no matches in breach data, but frequently seen

There are a variety of search results in Google when searching for this value. From the search results I was unable to find a source, but many of the results came from honeypot data. The password below the one identified also came up in a variety of articles. WIthin GitHub, that password was available in other honeypot data. This left me with some other questions:

How do write-ups about specific passwords impact those passwords being used in attacks?
How often is reported information security data used to perpetuate attacks?
What is the source of these other "unmatched" passwords? Are they generated or just from breach data not as freely available?

 

Takeaways

Password breach data is commonly used in credential stuffing attacks.

Use a password manager (could even be a notebook in a locked drawer)
Use unique passwords in combination with Multifactor Authentication (MFA)
Check sites like HaveIBeenPwned [6] to see if your email has been part of a reported breach
Use password breach data to diallow the use of those paswords
If you find a password you use publicly available, change it

 

[1] https://isc.sans.edu/data/ssh.html

[2] https://isc.sans.edu/diary/How+I+made+a+qwerty+keyboard+walk+password+generator+with+ChatGPT+Guest+Diary/30152/

[3] https://github.com/danielmiessler/SecLists/blob/master/Passwords/Leaked-Databases/rockyou.txt.tar.gz

[4] https://haveibeenpwned.com/Passwords

[5] https://isc.sans.edu/ssh_passwords.html

[6] https://haveibeenpwned.com/

--

Jesse La Grew

Handler

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.

Comment

Unsubscribe to no longer receive posts from Malware Devil.
Change your email settings at manage subscriptions.

Trouble clicking? Copy and paste this URL into your browser:
https://devi.ly/what-is-the-origin-of-passwords-submitted-to-honeypots-sat-sep-2nd/

WordPress.com and Jetpack Logos

Get the Jetpack app to use Reader anywhere, anytime

Follow your favorite sites, save posts to read later, and get real-time notifications for likes and comments.

Download Jetpack on Google Play Download Jetpack from the App Store
WordPress.com on Twitter WordPress.com on Facebook WordPress.com on Instagram WordPress.com on YouTube
WordPress.com Logo and Wordmark title=

Automattic, Inc. - 60 29th St. #343, San Francisco, CA 94110  

at September 02, 2023
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

No comments:

Post a Comment

Newer Post Older Post Home
Subscribe to: Post Comments (Atom)

Here's something you should know about fat loss:

You don't 𝑛𝑒𝑒𝑑 to go 'low-carb' in order to lose fat. ͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏...

  • Dork List
    ...
  • End of week Artemis update - July 18th 2025
    A round-up of our ILS focused news from this week ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌...
  • Artemis London 2025: Under two months to go
    Register now to attend at the lowest price ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌...

Search This Blog

  • Home

About Me

privacysavvy
View my complete profile

Report Abuse

Blog Archive

  • May 2026 (21)
  • April 2026 (94)
  • March 2026 (92)
  • February 2026 (76)
  • January 2026 (77)
  • December 2025 (79)
  • November 2025 (73)
  • October 2025 (88)
  • September 2025 (79)
  • August 2025 (71)
  • July 2025 (89)
  • June 2025 (78)
  • May 2025 (95)
  • April 2025 (85)
  • March 2025 (78)
  • February 2025 (31)
  • January 2025 (50)
  • December 2024 (39)
  • November 2024 (42)
  • October 2024 (54)
  • September 2024 (83)
  • August 2024 (2665)
  • July 2024 (3210)
  • June 2024 (2908)
  • May 2024 (3025)
  • April 2024 (3132)
  • March 2024 (3115)
  • February 2024 (2893)
  • January 2024 (3169)
  • December 2023 (3031)
  • November 2023 (3021)
  • October 2023 (2352)
  • September 2023 (1900)
  • August 2023 (2009)
  • July 2023 (1878)
  • June 2023 (1594)
  • May 2023 (1716)
  • April 2023 (1657)
  • March 2023 (1737)
  • February 2023 (1597)
  • January 2023 (1574)
  • December 2022 (1543)
  • November 2022 (1684)
  • October 2022 (1617)
  • September 2022 (1310)
  • August 2022 (1676)
  • July 2022 (1375)
  • June 2022 (1458)
  • May 2022 (1297)
  • April 2022 (1464)
  • March 2022 (1491)
  • February 2022 (1249)
  • January 2022 (1282)
  • December 2021 (1663)
  • November 2021 (3139)
  • October 2021 (3253)
  • September 2021 (3136)
  • August 2021 (732)
Powered by Blogger.