[Lustre] [kernel 5.10] ldiskfs_find_dest_de bad entry in directory
Description
Activity

Xinliang Liu March 22, 2023 at 2:08 AM
Xinliang Liu
March 22, 2023 at 2:08 AM
Closed as patch merged.

Xinliang Liu March 3, 2023 at 7:47 AM
Xinliang Liu
March 3, 2023 at 7:47 AM
Fixed patch sent: https://review.whamcloud.com/c/fs/lustre-release/+/50192

Xinliang Liu February 27, 2023 at 2:10 AM(edited)
Xinliang Liu
February 27, 2023 at 2:10 AM
(edited)
It seems this issue related to below code parts:
Part1 (introduced by commit: f94c02917f1d ext4: avoid cycles in directory h-tree)
block = dx_get_block(at);
for (i = 0; i <= level; i++) {
if (blocks[i] == block) {
ext4_warning_inode(dir,
"dx entry: tree cycle block %u points back to block %u",
blocks[level], block);
goto fail;
}
}
Part2 (introduced by ext4-pdirop.patch)
if (indirect == level) { /* the last index level */
struct ext4_dir_lock_data *ld;
u64 myblock;
/* By default we only lock DE-block, however, we will
* also lock the last level DX-block if:
* a) there is hash collision
* we will set DX-lock flag (a few lines below)
* and redo to lock DX-block
* see detail in dx_probe_hash_collision()
* b) it's a retry from splitting
* we need to lock the last level DX-block so nobody
* else can split any leaf blocks under the same
* DX-block, see detail in ext4_dx_add_entry()
*/
if (ext4_htree_dx_locked(lck)) {
/* DX-block is locked, just lock DE-block
* and return
*/
ext4_htree_spin_unlock(lck);
if (!ext4_htree_safe_locked(lck))
ext4_htree_de_lock(lck, frame->at);
...
if (myblock == EXT4_HTREE_NODE_CHANGED) {
/* someone split this DE-block before
* I locked it, I need to retry and lock
* valid DE-block
*/
ext4_htree_de_unlock(lck);
continue;
}
return frame;
}
After putting part2 after part1, this issue gone.

Xinliang Liu February 22, 2023 at 2:50 AM(edited)
Xinliang Liu
February 22, 2023 at 2:50 AM
(edited)
Bisected the related commit/patch:
f94c02917f1d ext4: avoid cycles in directory h-tree ( Which included in openEuler 22.03 LTS kernel kernel-5.10.0-60.58.0.86.oe2203)
ldiskfs/kernel_patches/patches/oe2203/ext4-pdirop.patch (Lustre ldiskfs patch on ext4)
Workaround:
revert commit “f94c02917f1d ext4: avoid cycles in directory h-tree
“ and update ext4-pdirop.patch
.
see io500 test suite test result
[openeuler@oe2203-test io500]$ sudo ./io500 config-minimal.ini
IO500 version io500-sc22_v2 (standard)
[RESULT] ior-easy-write 0.103132 GiB/s : time 316.294 seconds
[RESULT] mdtest-easy-write 0.067036 kIOPS : time 301.645 seconds
[ ] timestamp 0.000000 kIOPS : time 0.000 seconds
[RESULT] ior-hard-write 0.101985 GiB/s : time 312.619 seconds
[RESULT] mdtest-hard-write 0.054293 kIOPS : time 301.826 seconds
[RESULT] find 3.992785 kIOPS : time 9.124 seconds
[RESULT] ior-easy-read 0.023636 GiB/s : time 1380.092 seconds
[RESULT] mdtest-easy-stat 0.107839 kIOPS : time 187.558 seconds
[RESULT] ior-hard-read 0.022159 GiB/s : time 1438.550 seconds
[RESULT] mdtest-hard-stat 0.203911 kIOPS : time 81.015 seconds
[RESULT] mdtest-easy-delete 0.106105 kIOPS : time 190.760 seconds
[RESULT] mdtest-hard-read 0.065468 kIOPS : time 250.149 seconds
[RESULT] mdtest-hard-delete 0.103164 kIOPS : time 159.408 seconds
[SCORE ] Bandwidth 0.048447 GiB/s : IOPS 0.147904 kiops : TOTAL 0.084649
The result files are stored in the directory: ./results/2023.02.22-01.44.09
[openeuler@oe2203-test io500]$ uname -r
5.10.0-60.79.0.103debug.oe2203.aarch64
The commit f94c02917f1d
should be ok, we need to tune the ext4-pdirop.patch
maybe.
Done
Details
Details
Assignee

Reporter

Original estimate
2w
Time tracking
No time logged2w remaining
Sprint
Priority
Checklist
Open Checklist
Checklist
Open Checklist
Sentry
Linked Issues
Sentry
Linked Issues
Created February 15, 2023 at 2:41 AM
Updated April 7, 2023 at 3:07 PM
Resolved March 22, 2023 at 2:08 AM
Run io500 banchmak failed.
Client side log
[openeuler@oe2203-test io500]$ sudo OMPI_ALLOW_RUN_AS_ROOT=1 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 ./io500.sh config-all.ini 100 IO500 version io500-sc22_v2 (standard) [RESULT] ior-easy-write 0.105593 GiB/s : time 338.211 seconds ERROR: open64("/mnt/lustre/datafiles/2023.02.14-10.12.17/mdtest-easy/test-dir.0-0/mdtest_tree.0.0/file.mdtest.1.85", 66, 0664) failed. Error: Read-only file system, (aiori-POSIX.c:569) -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode -1.
Server side log
[ 9962.007724] LDISKFS-fs error (device dm-0): ldiskfs_find_dest_de:2412: inode #5767170: block 3771253: comm mdt00_000: bad entry in directory: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, name_len=0, size=4096 [ 9962.051171] Aborting journal on device dm-0-8. [ 9962.058456] LDISKFS-fs (dm-0): Remounting filesystem read-only [ 9962.059877] LDISKFS-fs error (device dm-0) in iam_txn_add:547: Journal has aborted [ 9962.064365] LustreError: 11366:0:(osd_io.c:2222:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30 [ 9962.066805] LustreError: 11366:0:(llog_cat.c:592:llog_cat_add_rec()) llog_write_rec -30: lh=00000000c04e4ff3 [ 9962.069137] LustreError: 11366:0:(tgt_lastrcvd.c:1326:tgt_add_reply_data()) lustre-MDT0000: can't update reply_data file: rc = -30 [ 9962.071742] LustreError: 11366:0:(osd_handler.c:2089:osd_trans_stop()) lustre-MDT0000: failed in transaction hook: rc = -30 [ 9962.074184] LustreError: 11366:0:(osd_handler.c:2099:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -30 [ 9962.074274] LustreError: 11348:0:(osd_handler.c:1789:osd_trans_commit_cb()) transaction @0x00000000c73ec34c commit error: 2
Similar upstream bug: https://jira.whamcloud.com/browse/LU-12268